Tutorial | Cloning a library from a remote Git repository#

Can you import code from Git to be used within a Dataiku project? Yes!

An important end goal of writing code is to be able to reuse it, whether within a Dataiku project, across projects within a Dataiku instance, or for projects external to Dataiku.

To this end, you can define code libraries within Dataiku that contain reusable code, and you can connect these libraries to remote git repositories.

For example, if you have code that has been developed outside of Dataiku and is available in a Git repository (such as a library created by another team), you can import this repository (or a part of it) in the project libraries, and use it in any code capability of Dataiku (such as recipes, notebooks, or webapps).

This short video summarizes how to import code from a Git repository into a Dataiku project library:

Note

Since the import is entirely based on Git, it works with any Git hosting service like Github, Gitlab or Bitbucket. It can also be a public or a private repository, as long as Dataiku has been configured to have access to it.

Follow the tutorial below to try out how this works yourself.

Prerequisites#

  • Dataiku 12.0 or later.

  • A Full Designer user profile.

  • Familiarity with code in Dataiku.

  • Familiarity with the basics of Git.

Technical requirements#

Connect to a remote Git repository#

  1. From the top navigation bar of any Dataiku project, navigate to Code > Libraries to the Library Editor.

    ../../_images/library-editor.png

    From here, you can develop new libraries, or you can decide to import an existing one from a remote Git repository.

  2. Click Git > Import from Git.

  3. In Repository, enter the URL for cloning the repository: https://github.com/dataiku/dss-plugin-sample-correlations.

  4. In Checkout, leave master as the branch to checkout.

    This field can contain the name of the branch to checkout, a tag, or a commit hash. If you click on the refresh button next to this field, Dataiku fetches the repository and will list the available branches.

  5. In Path in repository, enter python-lib.

    This field allows you to configure a path to a subfolder within the library repository. It can be particularly useful if multiple libraries are stored within the same repository, and if you only need to import some of them, rather than importing the entire repository in your project

  6. In Target path, enter the local path where the remote code will be stored: python/compute-corr.

  7. Click Save and Retrieve.

../../_images/import-from-git-dialog.png

You should now see the contents of the remote library in the Library Editor.

../../_images/library-cloned.png

The functions of the library can now be used in code in the Dataiku project by including an import statement such as:

from compute_corr import *

Pulling updates from the remote repository#

The screenshot above displays a warning that because the code is part of a Git reference, any changes will be lost in the next update from Git.

When code on the remote repository is updated, you can pull those updates to your local project library. From within the Library Editor:

  1. Click Git > Manage references.

  2. Click Update on each individual remote Git repository from which you want to pull updates.

  3. Alternatively, click Update All References to pull updates from every remote Git repo.

../../_images/library-update.png

Note

Starting with Dataiku version 10.0, changes made to your local Dataiku project library can be pushed back to the remote Git repository. Visit the reference documentation to find out more.

What’s next?#

  • For more details about all the other ways Dataiku integrates with Git, including projects, libraries, and plugins, and how to configure Dataiku to access private repositories, see the reference documentation on working with Git.

  • Reusing code is key to collaboration. Consult the reference documentation to learn more about reusing Python or R code.