Concept | Data lineage#

Watch the video

The Data Lineage view allows you to track where data originated, how and where it changed over time, and its journey through your data pipeline.

Data Lineage can help you either investigate the root cause of a data issue by tracking the changes made to a column upstream, or identify downstream impacts of changes you’d like to make to a dataset and notify impacted users.

Data Lineage view#

The Data Lineage view is a chart showing a selected column and its transformations, from data ingestion to the end of a data pipeline, including across projects. You can see how a column was created and used in a pipeline by following the connecting lines in the chart. You can access this view through the Data Catalog, via a dataset, or the right panel.

Screenshot of the Data Lineage view for a sample project.

Chart elements#

The chart includes several important elements.

The top bar shows the base project, dataset, and column that the lineage originated from. Click on any of them to navigate to that element, or use the Change Lineage button to choose another project, dataset, or column to view.

The top bar shows the base project, dataset, and column.

The lineage chart includes boxes for all the datasets in the column’s lineage. These boxes include:

  • The dataset name at the top.

  • The project name (in grey for the base project and randomly assigned colors for other projects).

  • The Flow zone the dataset is in, if applicable.

  • The base column for the lineage, highlighted in blue.

You can double-click on the dataset name, project name, zone name, or recipe icon to open them in another tab.

Dataset boxes show the dataset name, project, and flow zone.

To denote the lineage of a column, the dataset boxes are connected by two different kinds of lines:

  • Grey lines that connect datasets, with recipes shown in the middle.

  • Blue lines that connect columns in the lineage.

Dataset boxes are connected with two kinds of lines.

Defining lineage#

Some recipes have blue question mark buttons in the top right corner. This means the lineage could not be automatically computed with certainty and is based on simple name-based matching.

You can review and update the lineage by clicking on the blue button and adding or removing column relationships as necessary.

You can define relationships among data columns.

Right panel#

Select a dataset box in the chart to view the right panel. This panel includes three tabs:

  • Details, which includes several actions, information about the project, and related Flow elements.

  • Schema, with buttons to change the lineage base column.

  • Data quality, with data quality rules and statuses, if they have been set up.

The right panel includes three different tabs with information and navigation.

Exporting the chart#

You can export the data lineage chart as a PDF or image using the Export button in the bottom right.

Use the button in the bottom right to export the chart.

Accessing the view#

You can access the Data Lineage view in several ways:

  • From a dataset Explore view or Prepare recipe: Right-click on a column name and select See column lineage.

  • From the right panel within a dataset: Go to the Schema tab and click on the data lineage icon data-lineage-icon.

  • From the Data Catalog: Click on the Data Lineage tab and select the project, dataset, and column you want to view.

The Data Lineage tab in the Data Catalog.