Tutorial | Git for projects#

This tutorial will focus on using Git for Dataiku projects.

Get started#

Objectives#

This tutorial will walk through the active use of the Git integration to:

  • Connect a local project to a remote Git repository.

  • Create a working branch for the project to make local changes.

  • Push and merge your changes.

  • Pull the changes from the merge into the local base branch.

Prerequisites#

To complete this tutorial, you should have:

  • Dataiku version 12+.

  • A firm understanding of the Git model and terminology.

  • An empty remote Git repository for this project specifically.

  • A Dataiku instance that has been set up to work with remote Git repositories. Refer to Working with Git for help.

Create the project#

This tutorial will use the Git for Projects project.

  1. From the Dataiku Design homepage, click +New Project > DSS tutorials > Advanced Designer > Git for Projects.

Note

You can also download the starter project from this website and import it as a zip file.

Basic Git functionality#

Each change that you make in a Dataiku project is automatically committed to a local Git repository. Thus, any normal contribution to a project passively uses the Git integration for projects.

If you only want to use the local repository, you can still perform actions like reverting to previous commits (changes). See How-to | Undo actions in Dataiku for details.

Next, we’ll go beyond this default feature and actively use Git.

Connect to a remote Git repository#

First, we’ll connect our project to a remote Git repository. Each project must have its own repository.

  1. From the More Options (…) menu in the top navigation bar, select Version Control. This will show that we are on the master branch of the project.

  2. Click on the change tracking indicator change-indicator and select Add remote.

  3. Enter the URL of the remote and click OK.

  4. From the change tracking indicator, select Push.

  5. In your remote Git repository, view that the master branch has been successfully pushed.

GitHub screenshot of the project pushed.

Branch the project#

Next, we’ll create a new branch.

  1. From the branch indicator branch-indicator click Create new branch.

  2. Name the new branch prune-flow and click Next.

    Screenshot of the Create branch dialogue window.
  3. Click Duplicate and Create Branch.

This creates a duplicate project working on the prune-flow branch.

Important

A given Dataiku project can only be on one branch at any given time. This is why you have to make a duplicate project for the new branch.

If you switch the branch of the current project, this will affect all collaborators. A project can only be on one branch at a time, so if a colleague is working on one branch when you change it to another, there may be conflicts.

Make changes on the branch#

Now we can make our changes to the duplicate project on the prune-flow branch without disturbing the rest of the data team’s use of the master branch of the project.

  1. Go to the Flow G+F and see that the Flow forks three ways from the Orders_enriched_prepared dataset.

    Screenshot of the entire Flow.
  2. Delete the Orders_by_Country_Category and Orders_filtered datasets.

Screenshot of the entire Flow after pruning.

Push changes to the remote repository#

To make your changes appear in the remote repository:

  1. From the project menu in the top navigation bar, select Version Control.

  2. From the change tracking indicator, select Push.

You will see that the prune-flow branch has been pushed to your remote Git repository.

Merge branch to master#

To merge these changes with the master branch, you can do so normally outside of Dataiku.

Screenshot of the repository page in Github.

Note

Branching and Merge Conflicts: This tutorial describes an extremely simple branch and merge. If multiple collaborators each create a separate branch off of master, and then try to merge their separate branches back to master, they are likely to encounter Git merge conflicts. These can be difficult to resolve, and we may not be able to solve them for you. Your data team should agree on a plan for how to collaborate on projects using Git in order to avoid merge conflicts.

Pull master changes to local#

Finally, to see the merge reflected in Dataiku:

  1. Return to the original project.

  2. From the change tracking indicator, Fetch the changes from the remote Git repo.

  3. Finally, Pull the changes to your local Git.

Dataiku screenshot highlighting the Fetch and Pull options of the Project version control page.

What’s next?#

To learn more about other integrations with Git and Dataiku, check out this page on Working with Git.