Last updated: May 29, 2026
Application No. 18/360,291
DATA LINEAGE MANAGEMENT SYSTEM

Non-Final OA §101§103
Filed
Jul 27, 2023
Examiner
DUAN, VIVIAN WEIJIA
Art Unit
2191
Tech Center
2100 — Computer Architecture & Software
Assignee
Capital One Services LLC
OA Round
3 (Non-Final)
Interview Optional

— +54.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 73% grant rate with +54.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 11 resolved cases, 2023–2026
Examiner Intelligence

DUAN, VIVIAN WEIJIA View full profile →
Grants 73% — above average
Career Allowance Rate
8 granted / 11 resolved
+17.7% vs TC avg
Strong +54% interview lift
Without
With
+54.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.0%
-26.0% vs TC avg
§103
81.4%
+41.4% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 11 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is in response to the claims filed December 30, 2025. Claims 1-14 and 16-21 are pending. Claims 1, 8, and 14 are independent claims. Claims 1, 7, 8, 14, 16, 20, and 21 have been amended. Claim 15 has been cancelled.
The rejection under 35 U.S.C. 101 is maintained in view of Applicant’s arguments and amendments to the claims.
The rejection under 35 U.S.C. 103 is maintained in view of Applicant’s arguments and amendments to the claims.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1, the limitations “identify that a version of a source code has been uploaded to a software development hosting system” and “determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the version of the source code,…and wherein the data lineage information is determined based on learned correlation patterns identified…between source version changes and dataset lineage changes from a plurality of versions of source code” are functions that, under their broadest reasonable interpretation, recite the abstract idea of a mental process. The limitation encompasses a human mind carrying out the function through observation, evaluation, judgement, and/or opinion, or even with the aid of pen and paper. Thus, this limitation recites and falls within the “Mental Processes” grouping of abstract ideas under Prong 1.
Under Prong 2, this judicial exception is not integrated into a practical application. The additional elements “one or more memories”, “one or more processors, coupled to the one or more memories”, and “by the machine learning model” are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer and/or mere computer components. See MPEP 2106.05(f). The additional limitation “retrieve the version of the source code from the software development hosting system service” is directed to the insignificant extra solution activity of merely gathering and transmitting data. See MPEP 2106.05(g). The additional limitation “wherein the data lineage analysis model includes a machine learning model that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned patterns identified by the machine learning model form a plurality of version of source code” merely applies a generic computer component of a generic machine learning model to the abstract idea, and thus amounts to mere application of a generic computer component which does not amount to practical application. See MPEP 2106.05(f). The limitation “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention” is directed to the insignificant extra solution activity of mere storing and retrieving information in memory. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the recited judicial exception into a practical application and the claim is therefore directed to the judicial exception.
Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “one or more memories”, “one or more processors, coupled to the one or more memories”, and “by the machine learning model” amount to no more than mere instruction, or generic computer/computer components to carry out the exception. For the limitation “retrieve the version of the source code from the software development hosting system service”, the courts have identified mere data gathering and transmitting as well-understood, routine, and conventional activity. See MPEP 2106.05(d). As discussed above with respect to integration of the abstract idea into a practical application, the additional limitation “wherein the data lineage analysis model includes a machine learning model that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned patterns identified by the machine learning model form a plurality of version of source code” merely applies a generic computer component of a generic machine learning model to the abstract idea, and thus amounts to mere application of a generic computer component which does not amount to significantly more. See MPEP 2106.05(f). For the limitation “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention”, the courts have identified merely saving and retrieving information from memory as well-understood, routine, and conventional activity. See MPEP 2106.05(d). Accordingly, the claims are not patent eligible under 35 U.S.C. §101.
Regarding claim 2, the limitation “wherein, to identify that the version of the code has been uploaded to the software development hosting system device, … identify that the version of the source code has been uploaded to the software development hosting system device” is an additional mental step. The same generic computer/computer components are recited as in claim 1, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B. The limitation “using a data lineage webhook” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B.
Regarding claim 3, the limitation “wherein the software development hosting system device is associated with a Git repository hosting service” merely describes the software development hosting system device in the mental and data gathering step of claim 1, which is neither a practical application under Prong 2, nor amounts to significantly more under Step 2.
Regarding claim 4, the limitation “wherein the enterprise data management system is associated with a metadata repository” merely describes the enterprise data management system in the data transmission step of claim 1, which is insignificant extra solution activity.
Claim 5 does not recite additional mental steps. The limitation “wherein, to determine the data lineage information, the one or more processors are further configured to extract the data lineage information using a code parser” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B.
Claim 6 does not recite additional mental steps. The limitation “wherein the one or more processors are further configured to train the machine learning model based on the respective data lineage information of the datasets associated with the plurality of source codes” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B.
Regarding claim 7, the limitations “extract one or more version-difference features representing changes between the version of the source code and the prior version of the source code” and “wherein, to determine the data lineage information associated with the dataset, …applying the learned correlation patterns to the extracted version- difference features to identify the one or more datasets from which data originates or to which data moves” is an additional mental step. The limitation “the one or more processors are configured to apply” amounts to mere instructions to apply a generic computer/computer component to the judicial exception which does not amount to practical application under Prong 2, nor to significantly more under Step 2B, as explained above. The limitation “retrieve a prior version of the source code from the software development hosting system” amounts to the insignificant extra solution activity of mere data gathering and transmission, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B as explained above.
Regarding claim 8, the limitations “identifying, by a data lineage management device, that a source code has been uploaded to a Git repository hosting service” and “determining, using a data lineage analysis model, data lineage information associated with a dataset related to the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the source code, …, and wherein the data lineage information is determined based on learned correlation patterns identified … between source version changes and dataset lineage changes from a plurality of versions of source code” are functions that, under their broadest reasonable interpretation, recite the abstract idea of a mental process. The limitations encompass a human mind carrying out the function through observation, evaluation, judgement, and/or opinion, or even with the aid of pen and paper. Thus, this limitation recites and falls within the “Mental Processes” grouping of abstract ideas under Prong 1.
Under Prong 2, this judicial exception is not integrated into a practical application. The additional limitation “retrieving, by the data lineage management device, the source code from the Git repository hosting service” is directed to the insignificant extra solution activity of merely gathering and transmitting data. See MPEP 2106.05(g). The additional limitations “wherein the data lineage analysis model includes a machine learning model that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned patterns identified by the machine learning model form a plurality of version of source code” and “by the machine learning model” merely applied a generic computer component of a generic machine learning model to the abstract idea, and thus amounts to mere application of a generic computer component which does not amount to practical application. See MPEP 2106.05(f). The limitation “automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention” is directed to the insignificant extra solution activity of mere storing and retrieving information in memory. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the recited judicial exception into a practical application and the claim is therefore directed to the judicial exception.
Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. For the limitation “retrieving, by the data lineage management device, the source code from the Git repository hosting service”, the courts have identified mere data gathering and transmitting as well-understood, routine, and conventional activity. See MPEP 2106.05(d). As discussed above with respect to integration of the abstract idea into a practical application, the additional limitation “wherein the data lineage analysis model includes a machine learning model that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned patterns identified by the machine learning model form a plurality of version of source code” and “by the machine learning model” merely applies a generic computer component of a generic machine learning model to the abstract idea, and thus amounts to mere application of a generic computer component which does not amount to significantly more. For the limitation “automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention” the courts have identified mere data gathering and transmitting as well-understood, routine, and conventional activity. See MPEP 2106.05(d). Accordingly, the claims are not patent eligible under 35 U.S.C. §101.
Regarding claim 9, the limitation “wherein identifying that the source code has been uploaded to the Git repository hosting service is performed using a data lineage webhook” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B. See MPEP 2106.05(f).
Regarding claim 10, the limitation “wherein determining the data lineage information associated with the dataset is based on the source code and the prior version of the source code” is a mental step. The limitation “retrieving a prior version of the source code from the Git repository hosting service” amounts to the insignificant extra solution activity of mere data gathering and transmission, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B as explained above.
Regarding claim 11, the limitation “wherein the enterprise data management system is associated with a metadata repository” merely describes the enterprise data management system in the data transmission step of claim 8, which is insignificant extra solution activity
Claim 12 does not recite additional mental steps. The limitation “wherein determining the data lineage information includes extracting the data lineage information using a code parser” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B. See MPEP 2106.05(f).
Claim 13 does not recite additional mental steps. The limitation “training the machine learning model based on the respective data lineage information of the datasets associated with the plurality of source codes” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B. See MPEP 2106.05(f).
Regarding claim 14, the limitations “identify that a version of a source code has been uploaded to a software development hosting system device” and “determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the source code, …, and wherein the data lineage information is determined based on learned correlation patterns identified … between source version changes and dataset lineage changes from a plurality of versions of source code” are functions that, under their broadest reasonable interpretation, recite the abstract idea of a mental process. The limitations encompass a human mind carrying out the function through observation, evaluation, judgement, and/or opinion, or even with the aid of pen and paper. Thus, this limitation recites and falls within the “Mental Processes” grouping of abstract ideas under Prong 1.
Under Prong 2, this judicial exception is not integrated into a practical application. The additional limitations “a non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions…”, “train a machine learning model based on respective data lineage information of datasets associated with a plurality of source codes”, and “wherein the data lineage analysis model includes a machine learning model”, and “by the machine learning model” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer and/or mere computer components. See MPEP 2106.05(f). The additional limitation “retrieve the version of the source code from the software development hosting system device” is directed to the insignificant extra solution activity of merely gathering and transmitting data. See MPEP 2106.05(g). The limitation “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input” is directed to the insignificant extra solution activity of mere storing and retrieving information in memory. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the recited judicial exception into a practical application and the claim is therefore directed to the judicial exception.
Under Step 2B, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “a non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions…”, “train a machine learning model based on respective data lineage information of datasets associated with a plurality of source codes”, and “wherein the data lineage analysis model includes a machine learning model”, and “by the machine learning model” amount to no more than mere instruction, or generic computer/computer components to carry out the exception. For the limitations “retrieve the version of the source code from the software development hosting system device”, the courts have identified mere data gathering and transmitting as well-understood, routine, and conventional activity. See MPEP 2106.05(d). For the limitations “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input”, the courts have identified merely saving and retrieve information from memory as well-understood, routine, and conventional activity. See MPEP 2106.05(d). Accordingly, the claims are not patent eligible under 35 U.S.C. §101.
Claim 15 does not recite additional mental steps. The same generic computer/computer components are recited as in claim 14, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B. The limitation “wherein the one or more instructions further cause the data management device to post the data lineage information to an enterprise data management system” amounts to the insignificant extra solution activity of mere data gathering and transmission, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B as explained above.
Regarding claim 16, the limitation “wherein the enterprise data management system is associated with a metadata repository” merely describes the enterprise data management system in the data transmission step of claim 1, which is insignificant extra solution activity.
Regarding claim 17, the limitation “wherein the one or more instructions, that cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device, cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device” is an additional mental step. The limitation “using a data lineage webhook” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B.
Regarding claim 18, the limitation “wherein the software development hosting system device is associated with a Git repository hosting service” merely describes the software development hosting system device in the mental and data gathering step of claim 1, which is neither a practical application under Prong 2, nor amounts to significantly more under Step 2.
Claim 19 does not recite additional mental steps. The limitation “wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information, cause the data lineage management device to extract the data lineage information using a code parser” is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer, and/or computer component, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B.
Regarding claim 20, the limitation “extract one or more version-difference features representing changes between the version of the source code and the prior version of the source code” and “…determine the data lineage information associated with the dataset, …apply the learned correlation patterns to the extracted version-difference features to identify the one or more datasets from which data originates or to which data moves” is an additional mental step. The limitations “wherein the one or more instructions, that cause the data lineage management device to…” and “cause the data lineage management device to apply” amount to mere instructions to apply a generic computer/computer component to the judicial exception, which does not amount to practical application under Prong 2, nor to significantly more under Step 2B, as discussed above. The limitation “retrieve a prior version of the source code from the software development hosting system device” amounts to the insignificant extra solution activity of mere data gathering and transmission, which does not amount to practical application under Prong 2, nor amount to significantly more under Step 2B as explained above.
Claim 21 does not recite additional mental steps. The limitation “wherein the one or more processors are further configured to automatically trigger an update of lineage metadata stored in the enterprise data management system without user input in response to identifying the data lineage information” amounts to merely saving and retrieving information from memory, which does not amount to practical application under Prong 2, nor to significantly more, as discussed above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-14 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20190005117 A1 (hereinafter “Vasisht”), in view of “What is a webhook?” from redhat.com (hereinafter “Red Hat”), further in view of US 20220269884 A1 (hereinafter “Walters”), further in view of US 20160253340 A1 (hereinafter “Barth”).

Regarding claim 1, Vasisht discloses:
A system for managing data lineage, the system comprising (Fig. 2): 
one or more memories; and one or more processors, coupled to the one or more memories, configured to (Fig. 2):
- …
- …
- determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the version of the source code, wherein the data lineage analysis model includes a machine learning model … (Paragraph [0010], “In some embodiments, lineage detector may also parse the source code using various techniques, such as translating “select all” statements, resolving orphaned columns, resolving column aliases, and resolving references between multiple queries, etc. After parsing the source code, lineage detector may determine the data lineage of the specified target calculation based on the parsed source code [determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code]”; Paragraph [0040], “Programs 282 may include one or more machine learning, trending, and/or pattern recognition applications (not shown) that cause the processor 260 to execute one or more process related to lineage detection [wherein the data lineage analysis model includes a machine learning model]”; Paragraph [0023], “Either term may be interpreted as the process of identifying the hierarchy, discovering the location, and monitoring the changes of all data elements of a database component (e.g., calculation) [wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the version of the source code]”) [Examiner’s remarks: A data lineage analysis model is used to analyze the version of source code using a machine learning algorithm. Data lineage is data which indicates hierarchy, location, and changes (original dataset or dataset to which something moves) and is not indicated by version number.]; and
- …
Vasisht does not explicitly disclose:
- identify that a version of a source code has been uploaded to a software development hosting system device;
- retrieve the version of the source code from the software development hosting system device;
- …that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code…
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Red Had discloses:
- identify that a version of a source code has been uploaded to a software development hosting system device (Page 5, “A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [identify that a version of source code has been uploaded to a software development hosting system]”) [Examiner’s remarks: A webhook identifies when a version of source code (code is updated and pushed) is uploaded to the software development hosting system (Git repository).];
- retrieve the version of the source code from the software development hosting system device (Page 5, “The repository then automatically sends the payload to the desired state engine’s webhook address, informing it of the code change…a system administrator can use webhooks to automatically deploy the latest changes on their managed hosts [retrieve the version of the source code from the software development hosting system device]”) [Examiner’s remarks: The relevant source code is retrieved (deployed) from the software development hosting system.];
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vasisht to include “identify that a version of a source code has been uploaded to a software development hosting system device” and “retrieve the version of the source code from the software development hosting system device”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (Page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository.
The combination of Vasisht and Red Hat does not explicitly disclose:
- … that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code…
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Walters discloses:
- …that is trained based on respective data lineage information of datasets associated with a plurality of [documents], and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents] (Paragraph [0002], “obtain document lineage training data associated with a plurality of historical documents and corresponding lineage data of the plurality of historical documents, wherein the lineage data identifies respective sequences of versions of the historical documents; train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data […that is trained based on respective data lineage information of datasets associated with a plurality of [documents]]; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document [and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents]]”; Paragraph [0017], “In some implementations, the lineage data may include and/or indicate relative timing of respective versions of the historical documents (e.g., corresponding timestamps of the respective versions) and/or respective sequences of versions of the historical documents (e.g., corresponding sequence identifiers of the respective versions)”; Paragraph [0028], “Accordingly, the document management system may use the trained linear analysis model to determine a relationship (e.g., a timing relation and/or sequential relationship) of corresponding sections of the versions, which corresponds to a lineage of a particular section of the source document”) [Examiner’s remarks: The model is trained on the data lineage information and associated historical documents. Training in machine learning is used to find relationships (correlations) between the given inputs (here lineage data and historical documents). The given model may find a relationship between lineage changes and document changes. One of ordinary skill in the art understands that the source documents of Walters may be replaced with the source code of Vasisht to achieve the current invention.]…
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “…that is trained based on respective data lineage information of datasets associated with a plurality of [documents], and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with training machine learning models. 
The combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Barth discloses:
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users [enterprise data management system]… In one embodiment, all (or substantially all) data in the user's analytics environment is registered with the metadata repository, which preferably maintains various types of metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a data set, etc.), and user data (user rights, access history, user comments, etc.). … Further, preferably the metadata repository automatically updates and provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analyses. The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content [automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention]”; Paragraph [0050], “The platform tracks updates, e.g., to business definitions, data lineage, and changes to source system formats”) [Examiner’s remarks: Barth discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users). The data management system includes a metadata repository which stores at least lineage information as a form of metadata. The metadata repository automatically updates to provide information to users, which indicates that the lineage information in the repository is updated without manual intervention by the user.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with an enterprise data management system.

Regarding claim 2, the rejection of claim 1 is incorporated; and Vasisht does not explicitly disclose:
- wherein, to identify that the version of the source code has been uploaded to the software development hosting system device, the one or more processors are configured to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook.
However, Red Hat discloses:
- wherein, to identify that the version of the source code has been uploaded to the software development hosting system device, the one or more processors are configured to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook (Page 5, “A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [wherein, to identify that the version of the source code has been uploaded to the software development hosting system device, the one or more processors are configured to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vashist to include “wherein, to identify that the version of the source code has been uploaded to the software development hosting system device, the one or more processors are configured to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository using webhooks.

Regarding claim 3, the rejection of claim 1 is incorporated; and Vasisht does not explicitly disclose:
- wherein the software development hosting system device is associated with a Git repository hosting service.
However, Red Hat discloses:
- wherein the software development hosting system device is associated with a Git repository hosting service (Page 5, “In this context, the git repository plays the role of the server app…A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [wherein the software development hosting system device is associated with a git repository hosting service]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vashist to include “wherein the software development hosting system device is associated with a Git repository hosting service”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring with a Git repository.

Regarding claim 4, the rejection of claim 1 is incorporated; and the combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- wherein the enterprise data management system is associated with a metadata repository.
However, Barth discloses:
- wherein the enterprise data management system is associated with a metadata repository (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users, provides various automation and self-service features to enable those users to rapidly provision and manage an agile analytics environment for their Big Data. To this end, the platform includes a metadata repository, which tracks and manages all aspects of the data lifecycle, including storage management, access controls, encryption, compression, automated view creation, data format changes, data lineage, and refresh processing [wherein the enterprise data management system is associated with a metadata repository]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “wherein the enterprise data management system is associated with a metadata repository”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with a metadata repository.

Regarding claim 5, the rejection of claim 1 is incorporated; and Vasisht discloses:
- wherein, to determine the data lineage information, the one or more processors are further configured to extract the data lineage information using a code parser (Paragraph [0041], “For example, lineage detection module 292 may utilize a third-party parser, such as ZQL™, JSqlParser™, or General SQL Parser™, to perform rudimentary parsing of source code. In other embodiments, lineage detection module 292 may provide this functionality without any assistance from third-party software”; Paragraph [0049], “For example, if a source code contains both Ruby™ and SQL instructions, lineage detector 108 may parse only the SQL instructions while ignoring or filtering out the remaining Ruby™ instructions. Ultimately, lineage detector 108 will parse source code at least in the form of SQL because it is useful to lineage detection” [wherein to determine the data lineage information, the one or more processors are further configured to extract the data lineage information using a code parser]).  

Regarding claim 6, the rejection of claim 1 is incorporated; and Vasisht discloses:
- … source code (Paragraph [0010], “In some embodiments, lineage detector may also parse the source code using various techniques, such as translating “select all” statements, resolving orphaned columns, resolving column aliases, and resolving references between multiple queries, etc. After parsing the source code, lineage detector may determine the data lineage of the specified target calculation based on the parsed source code [… source code]”).
The combination of Vasisht and Red Hat does not explicitly disclose:
- wherein the one or more processors are further configured to train the machine learning model based on the respective data lineage information of the datasets associated with the plurality of …  
However, Walters discloses:
- wherein the one or more processors are further configured to train the machine learning model based on the respective data lineage information of the datasets associated with a plurality of [documents]… (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document (wherein the one or more processors are further configured to train the machine learning model based on the respective data lineage information of the datasets associated with a plurality of [documents]…”) [Examiner’s remarks: The machine learning model trains on a plurality of versions of documents and their lineage data (datasets) related to a document. One of ordinary skill in the art may apply the same process to source code.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “wherein the one or more processors are further configured to train the machine learning model based on the respective data lineage information of the datasets associated with a plurality of [documents]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with training machine learning models. 

Regarding claim 7, the rejection of claim 1 is incorporated; and Vasisht further discloses:
retrieve a [version] of the source code from the software development hosting system device (Paragraph [0044], “Lineage detector 108 may also acquire one or more source code parameters that identify or specify the location of the body of source code to be analyzed”) [Examiner’s remarks: The detector is able to retrieve specified version of the source code]; and 
The combination of Vasisht and Red Hat does not explicitly disclose:
… prior version…
extract one or more version-difference features representing changes between the version of the [document] and the prior version of the [document], wherein, to determine the data lineage information associated with the dataset, the one or more processors are configured to applying the learned correlation patterns to the extracted version- difference features to identify the one or more datasets from which data originates or to which data moves.
However, Walters discloses:
… prior version (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document”)…
extract one or more version-difference features representing changes between the version of the source code and the prior version of the source code, wherein, to determine the data lineage information associated with the dataset, the one or more processors are configured to applying the learned correlation patterns to the extracted version- difference features to identify the one or more datasets from which data originates or to which data moves (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document”; Paragraph [0030], “In some implementations, the document management system may utilize any available metadata (e.g., timestamps, sequence identifiers, and/or user identifiers) to verify that the lineage is accurate. For example, the document management system may verify the lineage based on one or more of metadata indicating that D2 was generated prior to D1 and D3, metadata indicating that D1 was generated prior to D3, metadata indicating that D1 was generated from D2 (e.g., according to a first sequence identifier), and/or metadata indicating that D3 was generated from D1 (e.g., according to a second sequence identifier that is subsequent to the first sequence identifier)”) [Examiner’s remarks: Given two versions of a document, the machine learning model compares the difference between the two versions and uses the difference and lineage information to identify the data lineage (which dataset it originates from).].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “… prior version…” and “extract one or more version-difference features representing changes between the version of the [document] and the prior version of the [document], wherein, to determine the data lineage information associated with the dataset, the one or more processors are configured to applying the learned correlation patterns to the extracted version- difference features to identify the one or more datasets from which data originates or to which data moves”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with data lineage determination using multiple versions. 

Regarding claim 8, Vasisht discloses:
A method for managing a data lineage of a dataset, comprising:
…
determining, using a data lineage analysis model, data lineage information associated with a dataset related to the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the source code, wherein the data lineage analysis model includes a machine learning model (Paragraph [0010], “In some embodiments, lineage detector may also parse the source code using various techniques, such as translating “select all” statements, resolving orphaned columns, resolving column aliases, and resolving references between multiple queries, etc. After parsing the source code, lineage detector may determine the data lineage of the specified target calculation based on the parsed source code [determining, using a data lineage analysis model, data lineage information associated with a dataset related to the source code]”; Paragraph [0040], “Programs 282 may include one or more machine learning, trending, and/or pattern recognition applications (not shown) that cause the processor 260 to execute one or more process related to lineage detection [wherein the data lineage analysis model includes a machine learning model]”; Paragraph [0023], “Either term may be interpreted as the process of identifying the hierarchy, discovering the location, and monitoring the changes of all data elements of a database component (e.g., calculation) [wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the source code]”) [Examiner’s remarks: A data lineage analysis model is used to analyze the version of source code using a machine learning algorithm. Data lineage is data which indicates hierarchy, location, and changes (original dataset or dataset to which something moves) and is not indicated by version number.] …; and
…
Vasisht does not explicitly disclose:
- identifying, by a data lineage management device, that a source code has been uploaded to a Git repository hosting service; 
- retrieving, by the data lineage management device, the source code from the Git repository hosting service; 
- … that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code
- automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Red Had discloses:
identifying, by a data lineage management device, that a source code has been uploaded to a Git repository hosting service (Page 5, “A webhook ban be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this even will trigger the webhook [identifying, by a data lineage management device, that a source code has been uploaded to a Git repository hosting service]”) [Examiner’s remarks: A webhook identifies when a version of source code (code is updated and pushed) is uploaded to the software development hosting system (Git repository).]; 
retrieving, by the data lineage management device, the source code from the Git repository hosting service (Page 5, “The repository then automatically sends the payload to the desired state engine’s webhook address, informing it of the code change…a system administrator can use webhooks to automatically deploy the latest changes on their managed hosts [retrieving, by the data lineage management device, the source code from the Git repository hosting service]”) [Examiner’s remarks: The relevant source code is retrieved (deployed) from the software development hosting system.];
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vasisht to include “identifying, by a data lineage management device, that a source code has been uploaded to a Git repository hosting service” and “retrieving, by the data lineage management device, the source code from the Git repository hosting service”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (Page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository.
The combination of Vasisht and Red Hat does not explicitly disclose:
- … that is trained based on respective data lineage information of datasets associated with a plurality of source codes, and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code…
- automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Walters discloses:
- … that is trained based on respective data lineage information of datasets associated with a plurality of source [documents], and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source [documents] (Paragraph [0002], “obtain document lineage training data associated with a plurality of historical documents and corresponding lineage data of the plurality of historical documents, wherein the lineage data identifies respective sequences of versions of the historical documents; train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data [that is trained based on respective data lineage information of datasets associated with a plurality of source [documents]]; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document [and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source [documents]]”; Paragraph [0017], “In some implementations, the lineage data may include and/or indicate relative timing of respective versions of the historical documents (e.g., corresponding timestamps of the respective versions) and/or respective sequences of versions of the historical documents (e.g., corresponding sequence identifiers of the respective versions)”; Paragraph [0028], “Accordingly, the document management system may use the trained linear analysis model to determine a relationship (e.g., a timing relation and/or sequential relationship) of corresponding sections of the versions, which corresponds to a lineage of a particular section of the source document”) [Examiner’s remarks: The model is trained on the data lineage information and associated historical documents. Training in machine learning is used to find relationships (correlations) between the given inputs (here lineage data and historical documents). The given model may find a relationship between lineage changes and document changes. One of ordinary skill in the art understands that the source documents of Walters may be replaced with the source code of Vasisht to achieve the current invention.]…
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “that is trained based on respective data lineage information of datasets associated with a plurality of source [documents], and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source [documents]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with training machine learning models. 
The combination of Vasisht, Red Hat, and Walters does not explicitly disclose: 
- automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention.
However, Barth discloses:
- automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users [enterprise data management system]… In one embodiment, all (or substantially all) data in the user's analytics environment is registered with the metadata repository, which preferably maintains various types of metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a data set, etc.), and user data (user rights, access history, user comments, etc.). … Further, preferably the metadata repository automatically updates and provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analyses. The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content [automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention]”; Paragraph [0050], “The platform tracks updates, e.g., to business definitions, data lineage, and changes to source system formats”) [Examiner’s remarks: Barth discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users). The data management system includes a metadata repository which stores at least lineage information as a form of metadata. The metadata repository automatically updates to provide information to users, which indicates that the lineage information in the repository is updated without manual intervention by the user.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “automatically posting, by the data lineage management device, the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with an enterprise data management system.

Regarding claim 9, the rejection of claim 8 is incorporated; and Vasisht does not explicitly disclose:
- wherein identifying that the source code has been uploaded to the Git repository hosting service is performed using a data lineage webhook.  
However, Red Hat discloses:
- wherein identifying that the source code has been uploaded to the Git repository hosting service is performed using a data lineage webhook (Page 5, “A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [wherein, to identify that the version of the source code has been uploaded to the software development hosting system device, the one or more processors are configured to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vashist to include “wherein identifying that the source code has been uploaded to the Git repository hosting service is performed using a data lineage webhook”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository using webhooks.

Regarding claim 10, the rejection of claim 8 is incorporated; and Vasisht further discloses:
- further comprising retrieving a [version] of the source code from the Git repository hosting service (Paragraph [0044], “Lineage detector 108 may also acquire one or more source code parameters that identify or specify the location of the body of source code to be analyzed”) [Examiner’s remarks: The detector is able to retrieve specified version of the source code]…
The combination of Vasisht and Red Hat does not explicitly disclose:
- … wherein determining the data lineage information associated with the dataset is based on the source code and the prior version of the source code.
However, Walters discloses:
- … wherein determining the data lineage information associated with the dataset is based on the [document] and the prior version of the [document] (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document [wherein determining the data lineage information associated with the dataset is based on the [document] and the prior version of the [document]”).
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “wherein determining the data lineage information associated with the dataset is based on the [document] and the prior version of the [document]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with data lineage determination using multiple versions. 

Regarding claim 11, the rejection of claim 8 is incorporated; and the combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- wherein the enterprise data management system is associated with a metadata repository.  
However, Barth discloses:
- wherein the enterprise data management system is associated with a metadata repository (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users, provides various automation and self-service features to enable those users to rapidly provision and manage an agile analytics environment for their Big Data. To this end, the platform includes a metadata repository, which tracks and manages all aspects of the data lifecycle, including storage management, access controls, encryption, compression, automated view creation, data format changes, data lineage, and refresh processing [wherein the enterprise data management system is associated with a metadata repository]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “wherein the enterprise data management system is associated with a metadata repository”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with a metadata repository.

Regarding claim 12, the rejection of claim 8 is incorporated; and Vasisht further discloses:
- wherein determining the data lineage information includes extracting the data lineage information using a code parser (Paragraph [0041], “For example, lineage detection module 292 may utilize a third-party parser, such as ZQL™, JSqlParser™, or General SQL Parser™, to perform rudimentary parsing of source code. In other embodiments, lineage detection module 292 may provide this functionality without any assistance from third-party software”; Paragraph [0049], “For example, if a source code contains both Ruby™ and SQL instructions, lineage detector 108 may parse only the SQL instructions while ignoring or filtering out the remaining Ruby™ instructions. Ultimately, lineage detector 108 will parse source code at least in the form of SQL because it is useful to lineage detection” [wherein determining the data lineage information includes extracting the data lineage information using a code parser]).  

Regarding claim 13, the rejection of claim 8 is incorporated; and Vasisht discloses:
- …source code (Paragraph [0010], “In some embodiments, lineage detector may also parse the source code using various techniques, such as translating “select all” statements, resolving orphaned columns, resolving column aliases, and resolving references between multiple queries, etc. After parsing the source code, lineage detector may determine the data lineage of the specified target calculation based on the parsed source code [… source code]”).
The combination of Vasisht and Red Hat does not explicitly disclose:
- further comprising training the machine learning model based on the respective data lineage information of the datasets associated with the plurality of....
However, Walters discloses:
- further comprising training the machine learning model based on the respective data lineage information of the datasets associated with the plurality of [databases] (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document (further comprising training the machine learning model based on the respective data lineage information of the datasets associated with the plurality of [databases]”) [Examiner’s remarks: The machine learning model trains on a plurality of versions of documents and their lineage data (datasets) related to a document. One of ordinary skill in the art may apply the same process to source code.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “further comprising training the machine learning model based on the respective data lineage information of the datasets associated with the plurality of [databases]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with training machine learning models. 

Regarding claim 14, Vasisht discloses:
A non-transitory computer-readable medium storing a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a data lineage management device, cause the data lineage management device to (Paragraph [0037], Paragraph [0034]):
- … source code (Paragraph [0010], “In some embodiments, lineage detector may also parse the source code using various techniques, such as translating “select all” statements, resolving orphaned columns, resolving column aliases, and resolving references between multiple queries, etc. After parsing the source code, lineage detector may determine the data lineage of the specified target calculation based on the parsed source code [… source code]”)
…
- determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code, wherein the data lineage information identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the source code, wherein the data lineage analysis model includes the machine learning model, …; and
…
Vasisht does not explicitly disclose:
- train a machine learning model based on respective data lineage information of datasets associated with a plurality of source codes;
- identify that a version of a source code has been uploaded to a software development hosting system device;
- retrieve the version of the source code from the software development hosting system device; 
- …and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code; and
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input.
However, Red Hat discloses:
- identify that a version of a source code has been uploaded to a software development hosting system device (Page 5, “A webhook ban be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this even will trigger the webhook [identify that a version of source code has been uploaded to a software development hosting system]”) [Examiner’s remarks: A webhook identifies when a version of source code (code is updated and pushed) is uploaded to the software development hosting system (Git repository).];
- retrieve the version of the source code from the software development hosting system device (Page 5, “The repository then automatically sends the payload to the desired state engine’s webhook address, informing it of the code change…a system administrator can use webhooks to automatically deploy the latest changes on their managed hosts [retrieve the version of the source code from the software development hosting system device]”) [Examiner’s remarks: The relevant source code is retrieved (deployed) from the software development hosting system.];
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vasisht to include “identify that a version of a source code has been uploaded to a software development hosting system device” and “retrieve the version of the source code from the software development hosting system device”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (Page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository.
The combination of Vasisht and Red Hat does not explicitly disclose:
- train a machine learning model based on respective data lineage information of datasets associated with a plurality of source codes;
- … and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code; and
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input.
However, Walters discloses:
- train a machine learning model based on respective data lineage information of datasets associated with a plurality of [documents] (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document [train a machine learning model based on respective data lineage information of datasets associated with a plurality of [documents]]”) [Examiner’s remarks: A lineage analysis model (machine learning model) is used trained based on lineages of versions of a plurality of historical documents. One of ordinary skill in the art may perform the same process using source code instead of other computer documents.];
- … and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents] (Paragraph [0002], “obtain document lineage training data associated with a plurality of historical documents and corresponding lineage data of the plurality of historical documents, wherein the lineage data identifies respective sequences of versions of the historical documents; train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document [and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents]]”; Paragraph [0017], “In some implementations, the lineage data may include and/or indicate relative timing of respective versions of the historical documents (e.g., corresponding timestamps of the respective versions) and/or respective sequences of versions of the historical documents (e.g., corresponding sequence identifiers of the respective versions)”; Paragraph [0028], “Accordingly, the document management system may use the trained linear analysis model to determine a relationship (e.g., a timing relation and/or sequential relationship) of corresponding sections of the versions, which corresponds to a lineage of a particular section of the source document”) [Examiner’s remarks: The model is trained on the data lineage information and associated historical documents. Training in machine learning is used to find relationships (correlations) between the given inputs (here lineage data and historical documents). The given model may find a relationship between lineage changes and document changes. One of ordinary skill in the art understands that the source documents of Walters may be replaced with the source code of Vasisht to achieve the current invention.]…
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “train a machine learning model based on respective data lineage information of datasets associated with a plurality of [documents]” and “and wherein the data lineage information is determined based on learned correlation patterns identified by the machine learning model between source version changes and dataset lineage changes from a plurality of versions of [documents]”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with training machine learning models. 
The combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input.
However, Barth discloses:
- automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users [enterprise data management system]… In one embodiment, all (or substantially all) data in the user's analytics environment is registered with the metadata repository, which preferably maintains various types of metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a data set, etc.), and user data (user rights, access history, user comments, etc.). … Further, preferably the metadata repository automatically updates and provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analyses. The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content [automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input]”; Paragraph [0050], “The platform tracks updates, e.g., to business definitions, data lineage, and changes to source system formats”) [Examiner’s remarks: Barth discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users). The data management system includes a metadata repository which stores at least lineage information as a form of metadata. The metadata repository automatically updates to provide information to users, which indicates that the lineage information in the repository is updated without manual intervention by the user.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with an enterprise data management system.

Regarding claim 16, the rejection of claim 14 is incorporated; and the combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- wherein the enterprise data management system is associated with a metadata repository.  
However, Barth discloses:
- wherein the enterprise data management system is associated with a metadata repository (Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users, provides various automation and self-service features to enable those users to rapidly provision and manage an agile analytics environment for their Big Data. To this end, the platform includes a metadata repository, which tracks and manages all aspects of the data lifecycle, including storage management, access controls, encryption, compression, automated view creation, data format changes, data lineage, and refresh processing [wherein the enterprise data management system is associated with a metadata repository]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “wherein the enterprise data management system is associated with a metadata repository”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with a metadata repository.

Regarding claim 17, the rejection of claim 14 is incorporated; and Vasisht does not explicitly disclose:
- wherein the one or more instructions, that cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device, cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook.  
However, Red Hat discloses:
- wherein the one or more instructions, that cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device, cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook (Page 5, “A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [wherein the one or more instructions, that cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device, cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vashist to include “wherein the one or more instructions, that cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device, cause the data lineage management device to identify that the version of the source code has been uploaded to the software development hosting system device using a data lineage webhook”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring and retrieval from a repository using webhooks.

Regarding claim 18, the rejection of claim 14 is incorporated; and Vasisht does not explicitly disclose:
- wherein the software development hosting system device is associated with a Git repository hosting service.  
However, Red Hat discloses:
- wherein the software development hosting system device is associated with a Git repository hosting service (Page 5, “In this context, the git repository plays the role of the server app…A webhook can be set up to trigger communication whenever a change is made in the repository. For example, if a piece of code is updated and pushed to the Git repository, this event will trigger the webhook [wherein the software development hosting system device is associated with a git repository hosting service]”).  
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Red Hat into the teachings of Vashist to include “wherein the software development hosting system device is associated with a Git repository hosting service”. As stated in Red Hat, “using webhooks in this manner allows desired state engines to keep close tabs on any infrastructure changes, without having to actively monitor repositories” (page 5). Allowing automated tracking of repository changes allows for more efficient workflows. It also makes it easier to monitor new versions of source code that may not have data lineage. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with data monitoring with a Git repository.

Regarding claim 19, the rejection of claim 14 is incorporated; and Vasisht further discloses:
- wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information, cause the data lineage management device to extract the data lineage information using a code parser (Paragraph [0041], “For example, lineage detection module 292 may utilize a third-party parser, such as ZQL™, JSqlParser™, or General SQL Parser™, to perform rudimentary parsing of source code. In other embodiments, lineage detection module 292 may provide this functionality without any assistance from third-party software”; Paragraph [0049], “For example, if a source code contains both Ruby™ and SQL instructions, lineage detector 108 may parse only the SQL instructions while ignoring or filtering out the remaining Ruby™ instructions. Ultimately, lineage detector 108 will parse source code at least in the form of SQL because it is useful to lineage detection” [wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information, cause the data lineage management device to extract the data lineage information using a code parser]).  

Regarding claim 20, the rejection of claim 14 is incorporated; and Vasisht further discloses:
retrieve a [version] of the source code from the software development hosting system device (Paragraph [0044], “Lineage detector 108 may also acquire one or more source code parameters that identify or specify the location of the body of source code to be analyzed”) [Examiner’s remarks: The detector is able to retrieve specified version of the source code]; and 
The combination of Vasisht and Red Hat does not explicitly disclose:
… prior version…
extract one or more version-difference features representing changes between the version of the source code and the prior version of the source code, wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information associated with the dataset, cause the data lineage management device to apply the learned correlation patterns to the extracted version-difference features to identify the one or more datasets from which data originates or to which data moves.
However, Walters discloses:
… prior version (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document”)…
extract one or more version-difference features representing changes between the version of the [document] and the prior version of the [document], wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information associated with the dataset, cause the data lineage management device to apply the learned correlation patterns to the extracted version-difference features to identify the one or more datasets from which data originates or to which data moves (Paragraph [0002], “…train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document, wherein the lineage analysis model includes a machine learning model that is trained based on respective lineages of versions of the plurality of historical documents that are identified in the corresponding lineage data; receive a plurality of versions of a document, wherein the plurality of versions of the document comprise separate documents associated with a lineage of the document; process a first version of the plurality of versions to identify a first set of sections of the document and a second version of the plurality of versions to identify a second set of sections of the document; determine, using a similarity analysis model, that a first section from the first set of sections and a second section from the second set of sections correspond to a particular section of the document; determine, using the lineage analysis model, a lineage of the particular section; and indicate the lineage in association with the first section and the second section to facilitate editing of the particular section of the document”; Paragraph [0030], “In some implementations, the document management system may utilize any available metadata (e.g., timestamps, sequence identifiers, and/or user identifiers) to verify that the lineage is accurate. For example, the document management system may verify the lineage based on one or more of metadata indicating that D2 was generated prior to D1 and D3, metadata indicating that D1 was generated prior to D3, metadata indicating that D1 was generated from D2 (e.g., according to a first sequence identifier), and/or metadata indicating that D3 was generated from D1 (e.g., according to a second sequence identifier that is subsequent to the first sequence identifier)”) [Examiner’s remarks: Given two versions of a document, the machine learning model compares the difference between the two versions and uses the difference and lineage information to identify the data lineage (which dataset it originates from).].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Walters into the combined teachings of Vasisht and Red Hat to include “… prior version…” and “extract one or more version-difference features representing changes between the version of the [document] and the prior version of the [document], wherein the one or more instructions, that cause the data lineage management device to determine the data lineage information associated with the dataset, cause the data lineage management device to apply the learned correlation patterns to the extracted version-difference features to identify the one or more datasets from which data originates or to which data moves”. As stated in Walters, “Document lineage provides the ability to trace errors in the document, to access past portions or inputs associated with the document (e.g., for reviewing and/or analyzing the document). Document lineage can provide an audit trail of the document.” (paragraph [0001]). Training machine learning models allows the model to better predict parameters related to data lineage detection, and allows for lineage detection with less human intervention. Therefore, it would be obvious to combine lineage detection with a machine learning model with data lineage determination using multiple versions.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over US 20190005117 A1 (hereinafter “Vasisht”), in view of “What is a webhook?” from redhat.com (hereinafter “Red Hat”), further in view of US 20220269884 A1 (hereinafter “Walters”), further in view of US 20160253340 A1 (hereinafter “Barth”), further in view of US 20170270022 A1 (hereinafter “Moresmau”).

Regarding claim 21, the rejection of claim 1 is incorporated; and the combination of Vasisht, Red Hat, and Walters does not explicitly disclose:
- wherein the one or more processors are further configured to automatically trigger an update of lineage metadata stored in the enterprise data management system without user input in response to identifying the data lineage information.
However, Barth discloses:
- wherein the one or more processors are further configured to automatically trigger an update of lineage metadata stored in the enterprise data management system without user input(Paragraph [0009], “The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users [enterprise data management system]… In one embodiment, all (or substantially all) data in the user's analytics environment is registered with the metadata repository, which preferably maintains various types of metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a data set, etc.), and user data (user rights, access history, user comments, etc.). … Further, preferably the metadata repository automatically updates and provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analyses. The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content [wherein the one or more processors are further configured to automatically trigger an update of lineage metadata stored in the enterprise data management system without user input]”) [Examiner’s remarks: Barth discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users). The data management system includes a metadata repository which stores at least lineage information as a form of metadata. The metadata repository automatically updates to provide information to users, which indicates that the lineage information in the repository is updated without user input.]…
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Barth into the combined teachings of Vasisht, Red Hat, and Walters to include “automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention”. As stated in Barth, “The system tracks data lineage and allows analysts to collaborate effectively, e.g., documenting and sharing insights into data structure and content. As the system is used, the metadata gets richer and more valuable, supporting additional automation and quality controls” (paragraph [0009]). Metadata regarding the data lineages of source code is important and useful for the entire enterprise to know, so that multiple people may use that information concurrently, as well as for automation. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with an enterprise data management system.
The combination of Vasisht, Red Hat, Walters, and Barth does not explicitly disclose:
… in response to identifying the data lineage information.
However, Moresmau discloses:
… in response to identifying the data lineage information (Paragraph [0028], “In loading the code, the relevant information (such as metadata information) can be stored in a repository. Thus, every time something changes in the code, an incremental analysis can be conducted to see what has changed since the previous code analysis. The data lineage presented herein is generally a snapshot of the flow of a selected data element throughout enterprise systems through time. In various embodiments, the data lineage snapshot can be run on a periodic basis, such as daily, weekly, biweekly, monthly, annually, or at any other time increment. In other embodiments, the data lineage can be run as directed by a user”; Paragraph [0104], “Additionally, the system can consolidate lineages crossing various technologies, data-stores, and platforms into a single repository supporting querying and visualization of end-to-end data lineage”) [Examiner’s remarks: Data lineage for a code may be derived without user intervention and added to a database in response to identifying the information. This may be combined with the automated update of the relational database of Barth to achieve the present limitation.].
Therefore, it would have been obvious to one or ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Moresmau into the combined teachings of Vasisht, Red Hat, Walters, and Barth to include “in response to identifying the data lineage information”. As stated in Moresmau, “Therefore, there is a need to manage information about the data, known as metadata, to understand complex relationships between objects from a variety of perspectives. Also, as an enterprise continuously goes through change, business and IT professionals have a need to quickly understand the source of data and the impact of change across different systems, different platforms, different programming languages, and different data stores” (paragraph [0003]). Automatically storing and updating data lineage information allows users quick and easy access to information regarding where the source of a piece of data or code may be, which aids in coordination over large enterprises. Therefore, it would be obvious to one of ordinary skill in the art to combine data lineage detection with automated updates of an enterprise data management system.

Response to Arguments
Applicant's arguments filed December 30, 2025 have been fully considered but they are not persuasive. 

Regarding the rejection under 35 U.S.C. 101, Applicant argues:
First, amended claim 1 is not directed to a mental process. 
The Office Action asserts that the limitations "identify that a version of a source code has been uploaded to a software development hosting system" and "determine, using a data lineage analysis model, data lineage information associated with a dataset related to the version of the source code" recite a mental process because they allegedly encompass "a human mind carrying out the function through observation, evaluation, judgement, and/or opinion, or even with the aid of pen and paper." See Office Action, page 2. 
Applicant respectfully disagrees. Amended claim 1 requires that the one or more processors determine data lineage information that identifies one or more datasets from which data originates or to which data moves, and that are not explicitly defined in the version of the source code, and further requires that the data lineage information is determined based on learned correlation patterns identified by a machine learning model between source version changes and dataset lineage changes from a plurality of versions of source code. These limitations cannot be performed by a human mind. A human cannot identify learned correlation patterns across a plurality of versions of source code and corresponding dataset lineage changes, nor can a human determine data lineage information that is not explicitly defined in the source code using pen and paper. Accordingly, the amended claim recites computer-specific analysis performed by a trained machine learning model, not a mental process. 
Moreover, claim 1 further requires automatically posting the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention. This limitation requires a system-initiated update of stored enterprise metadata without any user input, which is not a function that can be performed by a human mind. Therefore, amended claim 1 does not fall within the "Mental Processes" grouping of abstract ideas under Prong 1. 
See Remarks – Pages 9-10

Examiner’s Response:
	Examiner respectfully disagrees. Regarding Applicant’s amended claims, Applicant argues that “These limitations cannot be performed by a human mind. A human cannot identify learned correlation patterns across a plurality of versions of source code and corresponding dataset lineage changes, nor can a human determine data lineage information that is not explicitly defined in the source code using pen and paper”. Identifying patterns in data, in this case between source code versions and corresponding data lineage changes, is a process that the human mind is capable of performing. Humans are also capable of inferring or determining data lineage from source code without the information being explicitly defined. Therefore, the cited limitations fall under the “Mental Processes” grouping of abstract ideas. Merely reciting the use of a processor and machine learning model is analyzed under Prong 2 of the Alice framework as mere application of generic computer/computer components, which does not amount to practical application, nor amount to significantly more under Step 2B. See MPEP 2106.05(f).
	Regarding the argument that automatically posting lineage data to an enterprise data management system is not a mental step, Examiner agrees. However, this limitation is not rejected under Prong 1 of the Alice framework for being a mental processes, but rather under Prong 2 and Step 2B as merely saving and retrieving data from memory, which does not amount to practical application nor to significantly more. See MPEP 2106.05(g) and MPEP 2106.05(d).

In the remarks, Applicant Argues:
Second, amended claim 1 integrates any alleged abstract ideal into a practical application. 
MPEP §2106.05(a) explains that a claim is integrated into a practical application when it "applies, relies on, or uses the judicial exception in a manner that imposes a meaningful limit on the judicial exception." 
Here, amended claim 1 recites an ordered combination in which a trained data lineage analysis model identifies datasets from which data originates or to which data moves that are not explicitly defined in the source code, based on learned correlation patterns between source version changes and dataset lineage changes, and then automatically posts that data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without manual user intervention. This sequence does not merely gather or transmit data. Instead, it automatically updates stored lineage metadata, thereby changing the state of an enterprise data management system and eliminating manual lineage identification. 
Accordingly, any alleged abstract idea is meaningfully limited by its application to automatically updating stored lineage metadata in an enterprise data management system without user input, which constitutes a concrete technological improvement to enterprise data governance workflows. 

Examiner’s Response:
	Examiner respectfully disagrees. The limitation defining automatic updating of stored lineage metadata is not presently rejected as mere data gathering and transmission, but rather as merely storing and retrieving information from memory. As determined by the courts, merely storing and retrieving data from memory is considered a well-known, routine, and conventional activity, which does not integrate the judicial exception into a practical application. See MPEP 2106.05(g).

In the remarks, Applicant Argues:
Third, amended claim 1 recites significantly more than any alleged abstract idea. 
The Office Action asserts that the claims do not include additional elements sufficient to amount to significantly more, stating that "mere data gathering and transmitting" are well- understood, routine, and conventional activities. See Office Action, page 6. 
However, amended claim 1 does not merely recite data gathering or transmission. The claim requires automatically posting data lineage information to update stored lineage metadata associated with the dataset without manual user intervention, based on learned correlation patterns identified by a trained machine learning model. 
As explained in MPEP §2106.05(d), an activity is not well-understood, routine, or conventional simply because it is performed on a computer. The Office Action provides no evidence that automatically updating stored lineage metadata in an enterprise data management system, without user input and based on learned correlations between source code changes and dataset lineage changes, is a well-understood, routine, or conventional practice. To the contrary, the Office Action acknowledges elsewhere that the cited prior art does not explicitly disclose posting data lineage information to an enterprise data management system, confirming that this functionality is not conventional. Accordingly, when considered as an ordered combination, the additional elements of amended claim 1 amount to significantly more than any alleged judicial exception. 
For at least these reasons, amended claim 1 is patent-eligible under 35 U.S.C. § 101. Amended claims 8 and 14 recite similar subject matter. Therefore, amended claims 1, 8, and 14, and the claims dependent thereon, are patent-eligible under 35 U.S.C. § 101. 
Accordingly, Applicant respectfully requests that the Examiner reconsider and withdraw the rejection of claims 1-20 under 35 U.S.C. § 101. 

Examiner’s Response:
Examiner respectfully disagrees. As mentioned above, the automated posting of data lineage information is rejected not a mere data gathering or as conventional for simply being performed on a computer. Rather, it is rejected as merely saving and retrieving information from memory, which has been determined by the courts to be well-known, routine, and conventional activity. See MPEP 2106.05(d). Furthermore, lack of explicit disclosure in a portion of the cited prior art does mean an activity is not well-known, routine, or conventional. Therefore, the cited limitation does not amount to more significantly more than the judicial exception.

Regarding Applicant’s arguments under 35 U.S.C. 103, Applicant argues:
Claims 14 and 17-19 stand rejected under 35 U.S.C. § 103 as allegedly being unpatentable over VASISHT (U.S. Patent Publication No. 2019/0005117), RED HAT ("What is a webhook?"), and WALTERS (U.S. Patent Publication No. 2022/0269884). Applicant respectfully traverses the rejection. 
Applicant respectfully submits that VASISHT, RED HAT, and WALTERS do not disclose each and every feature recited in amended claim 14. For example, VASISHT, RED HAT, and WALTERS do not disclose one or more instructions that, when executed by one or more processors of a data lineage management device, cause the data lineage management device to at least "automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input," as recited in amended claim 14. 
In rejecting claim 1, the Office Action concedes that "[t]he combination of Vasisht, Red Hat, and Walters does not explicitly disclose [] post the data lineage information to an enterprise data management system." See Office Action, page 21. In turn, VASISHT, RED HAT, and WALTERS do not disclose at least" automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input," as recited in amended claim 14. Therefore, amended claim 14, and the claims that depend thereon, are patentable over the cited sections of the applied references. 
Accordingly, Applicant respectfully requests that the Examiner reconsider and withdraw the rejection of claims 14 and 17-19 under 35 U.S.C. § 103 based on VASISHT, RED HAT, and WALTERS. 

Examiner’s response:
	Applicant’s arguments with respect to claim(s) 14 and 17-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

In the remarks, Applicant argues:
Applicant respectfully submits that VASISHT, RED HAT, and WALTERS do not disclose each and every feature recited in amended claim 1. For example, VASISHT, RED HAT, and WALTERS do not disclose one or more processors configured to at least "automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input," as recited in amended claim 1. 
As stated above, in rejecting claim 1, the Office Action concedes that "[t]he combination of Vasisht, Red Hat, and Walters does not explicitly disclose [] post the data lineage information to an enterprise data management system." See Office Action, page 21. 
Attempting resolve this deficiency, the Office Action relies on BARTH's paragraph 9 
and asserts that BARTH "discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users) in which information, including data lineage information can be posted for tracking]." See Office Action, pages 21-22. 
However, as stated in the Office Action, BARTH's paragraph 9 recites: "The platform, which may operate in a cloud computing architecture as an infrastructure shared by enterprise users ... In one embodiment, all (or substantially all) data in the user's analytics environment is registered with the metadata repository, which preferably maintains various types of metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a data set, etc.), and user data (user rights, access history, user comments, etc.). ... Further, preferably the metadata repository automatically updates and provides access for self- service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analyses." Accordingly, the Office Action asserts that BERTH "discloses an enterprise data management system (a metadata repository containing metadata shared by enterprise users) in which information, including data lineage information can be posted for tracking." See Office Action, pages 21-22. 
Accordingly, at most, BARTH describes a metadata repository that maintains and 
presents metadata, including lineage information, for enterprise users and analysts. BARTH does not disclose one or more processors configured to automatically post the data lineage information to an enterprise data management system to update stored lineage metadata associated with the dataset without user input, as expressly required by amended claim 1. Rather, BARTH's paragraph 9 emphasizes that the metadata repository "provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts," and further describes analysts documenting, sharing, and customizing data. Such disclosures necessarily involve manual user interaction and analyst-driven workflows, and therefore do not teach or suggest the claimed automatic posting of data lineage information without user input. Moreover, BARTH does not disclose that newly determined data lineage information is automatically posted in response to source code changes or used to update stored lineage metadata associated with a dataset without manual intervention. Thus, even when considered in view of BARTH, the combination of VASISHT, RED HAT, and WALTERS still fails to disclose or suggest the automatic posting limitation recited in amended claim 1. 
For at least the foregoing reasons, amended claim 1 is patentable. Amended claim 8 
recites similar features. Therefore, amended claims 1 and 8, and the claims that depend thereon, are patentable over the cited sections of the applied references. 

Examiner’s Response:
	Examiner respectfully disagrees. Applicant argues that “BARTH's paragraph 9 emphasizes that the metadata repository "provides access for self-service integration, preferably through a graphical user interface (GUI) for analysts," and further describes analysts documenting, sharing, and customizing data. Such disclosures necessarily involve manual user interaction and analyst-driven workflows, and therefore do not teach or suggest the claimed automatic posting of data lineage information without user input”. Barth discloses “further, preferably the metadata repository automatically updates and provides access for self- service integration, preferably through a graphical user interface (GUI) for analysts to help them find, select, and customize data for their analysis” (Paragraph [0009]). The self-service function to “find, select, and customize data for their analysis” is separate from the automated updating of the metadata repository itself. Therefore, Barth does not necessitate user interaction for updating (posting) metadata, but rather only provides the option for users to retrieve and customize data for their own analysis. Applicant further argues that “BARTH does not disclose that newly determined data lineage information is automatically posted in response to source code changes or used to update stored lineage metadata associated with a dataset without manual intervention”. The current iteration of the claims does not specify that the posting of the data lineage information should happen automatically “in response to source code changes”. Barth discloses a metadata repository including data lineage information, which tracks updates and is automatically updated, thereby disclose automatically posting to update stored lineage metadata. The tracking is done by the platform, and therefore does not require manual intervention. Therefore, Barth discloses the added limitation and the rejection under 35 U.S.C. 103 is maintained.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIVIAN WEIJIA DUAN whose telephone number is (703)756-5442. The examiner can normally be reached Monday-Friday 8:30AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Mui can be reached at (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.W.D./Examiner, Art Unit 2191                                                                                                                                                                                                        /WEI Y MUI/Supervisory Patent Examiner, Art Unit 2191
Read full office action
Prosecution Timeline

Show 5 earlier events
Jul 23, 2025
Response Filed
Nov 04, 2025
Final Rejection mailed — §101, §103
Dec 11, 2025
Interview Requested
Dec 30, 2025
Response after Non-Final Action
Feb 02, 2026
Request for Continued Examination
Feb 09, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §101, §103
May 28, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/243,710
Patent 12619405
METHOD AND SYSTEM FOR INCREMENTAL FUNCTIONAL APPROACH-BASED DATAFLOW ANALYSIS
2y 8m to grant Granted May 05, 2026
18/268,073
Patent 12541357
Operating System Upgrading Method, Electronic Device, Storage Medium, and Chip System
2y 7m to grant Granted Feb 03, 2026
18/168,161
Patent 12536005
TRANSFORMING A JAVA PROGRAM USING A SYMBOLIC DESCRIPTION LANGUAGE MODEL
2y 11m to grant Granted Jan 27, 2026
18/104,154
Patent 12498914
ORCHESTRATION OF SOFTWARE RELEASES ON A CLOUD PLATFORM
2y 10m to grant Granted Dec 16, 2025
18/340,919
Patent 12481483
AUTOMATED GENERATION OF WEB APPLICATIONS BASED ON WIREFRAME METADATA GENERATED FROM USER REQUIREMENTS
2y 5m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+54.2%)
2y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 11 resolved cases by this examiner. Grant probability derived from career allowance rate.