Cloning a Library from a Remote Git Repository

Can you import code from Git to be used within a Dataiku DSS project? Yes!

An important end goal of writing code is to be able to reuse it, whether within a DSS project, across projects within a DSS instance, or for projects external to DSS.

To this end, you can define code libraries within Dataiku DSS that contain reusable code, and you can connect these libraries to remote git repositories.

For example, if you have code that has been developed outside of DSS and is available in a Git repository (such as a library created by another team), you can import this repository (or a part of it) in the project libraries, and use it in any code capability of DSS (such as recipes, notebooks, or webapps).

This short video summarizes how to import code from a Git repository into a DSS project library:

Note

Since the import is entirely based on Git, it works with any Git hosting service like Github, Gitlab or Bitbucket. It can also be a public or a private repository, as long as Dataiku DSS has been configured to have access to it.

Follow the tutorial below to try out how this works yourself.

Prerequisites

  • Familiarity with code in Dataiku DSS

  • Familiarity with the basics of Git

Technical Requirements

Connect to a Remote Git Repository

  • From the top navigation bar of any DSS project, navigate to Code > Libraries to the Library Editor.

../../../_images/library-editor.png

From here, you can develop new libraries, or you can decide to import an existing one from a remote Git repository.

  • Click Git > Import from Git.

Repository must contain the URL for cloning the repository.

  • Enter https://github.com/dataiku/dss-plugin-sample-correlations as the repository.

The Checkout field can contain the name of the branch to checkout, a tag, or a commit hash. If you click on the refresh button next to this field, Dataiku DSS fetches the repository and will list the available branches.

  • Leave master as the branch to checkout.

Path in repository allows you to configure a path to a subfolder within the library repository. It can be particularly useful if multiple libraries are stored within the same repository, and if you only need to import some of them, rather than importing the entire repository in your project.

  • Enter python-lib as the path in repository.

Target path allows you to configure the local path where the remote code will be stored.

  • Enter python/compute-corr as the target path.

  • Click Save and Retrieve.

../../../_images/import-from-git-dialog.png

You should now see the contents of the remote library in the Library Editor.

../../../_images/library-cloned.png

The functions of the library can now be used in code in the Dataiku DSS project by including an import statement such as:

from compute_corr import *

Pulling Updates from the Remote Repository

The screenshot above displays a warning that because the code is part of a Git reference, any changes will be lost in the next update from Git.

When code on the remote repository is updated, you can pull those updates to your local project library. From within the Library Editor:

  • Click Git > Manage references.

  • Click Update on each individual remote Git repository from which you want to pull updates.

  • Alternatively, click Update All References to pull updates from every remote Git repo.

../../../_images/library-update.png

Note

Changes made to your local Dataiku DSS project library cannot be pushed back to the remote Git repository.

What’s next?

  • For more details about all the other ways Dataiku DSS integrates with Git, including projects, libraries, and plugins, and how to configure DSS to access private repositories, see the product documentation on working with Git.

  • Reusing code is key to collaboration. Consult the product documentation to learn more about reusing Python or R code.