Tutorial | Code Environments (Developer part 3)#

Code environments address the problem of managing dependencies and versions when writing code in Python and R. They are similar to the Python virtualenv or to renv (for R users).

Note

Before beginning this tutorial, you may wish to review a code environment concept article.

Objectives#

In this tutorial, you will:

  • Create a code environment.

  • Set a code environment at the project, recipe, and notebook level.

Starting here?

If you skipped the previous sections and just want to focus on code environments, you need to:

  1. Satisfy the prerequisites.

  2. Create the project (+New Project > DSS tutorials > Developer > Code Environments) or download and import the zip file from this website.

  3. Build the Flow containing data from the fictional Haiku T-Shirt company.

One additional prerequisite is the permission to create, modify, and use code environments. These permissions can be assigned by an administrator of your Dataiku instance.

Tip

If you don’t have the necessary permissions to create code environments on your instance, you can complete this tutorial using the free edition or a trial of Dataiku Cloud.

Create a code environment#

Note

The process for creating code environments is slightly different for Dataiku Cloud. See the documentation for working with Python or R to learn how to create code environments on Dataiku Cloud.

A Dataiku administrator, or any user in a group with the proper permissions, can create different code environments.

There are two main ways to create a code environment:

  • Create a new Python or R environment from scratch and import various packages to it.

  • Import a previously created Python or R environment from a ZIP file into a Dataiku instance and optionally configure it further.

In this exercise, we will create a single code environment with all the required packages for the tutorials of the Developer learning path.

The process to create a new code environment differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Create a new environment#

  1. From the Dataiku Cloud Launchpad, click the Code Envs menu in the left panel.

  2. On the top right corner, click + Add a python code environment.

  3. Enter the name py36-developer-v9 of the environment and choose the Python 3.6 version.

    Note

    For the purposes of this tutorial alone, you are welcome to create a code environment for another version of Python if you do not have Python 3.6 on your system. However, all of the courses in the Developer learning path were tested with Python 3.6 code environments.

  1. From the Dataiku homepage, click the Applications menu in the top navigation bar.

  2. Select Administration > Code Envs.

  3. Once in the Code Envs tab, click New Python Env in the upper right corner.

  4. Give it a descriptive name and select a Python version. We’ll be using py36-developer-v9 to indicate the Python and DSS versions.

    Note

    Here we are creating a Python 3.6 environment. As explained in the documentation, you’ll need the requested version of Python to be installed on your system in order to create that kind of code environment.

    For the purposes of this tutorial alone, you are welcome to create a code environment for another version of Python if you do not have Python 3.6 on your system. However, all of the courses in the Developer learning path were tested with Python 3.6 code environments.

  5. Leave the default deployment type, Managed by DSS (recommended).

  6. Leave the default settings for Conda, Mandatory packages, and Jupyter, and click Create.

    Dataiku screenshot of the dialog for creating a new code environment.

Install packages#

  1. Paste the list of additional packages in the Package to install code field (above the Base Packages (Pip)):

    openpyxl
    matplotlib
    tabula-py
    altair
    bokeh
    dash
    nbformat==4.2.0
    plotly==4.14.3
    requests
    datetime
    

    Note

    Package presets allows you to quickly have a working environment depending on your case.

  2. Click Add to confirm.

  3. Wait for the code env to be uploaded.

    Dataiku screenshot of the succeeded uploading of the code env.
  4. Click on Close.

  1. Once the environment has been created, open it, and go to Currently installed packages. Notice that the mandatory packages are already installed.

  2. Go to Packages to install. This is where you can install additional packages.

  3. Under Requested packages, click on Add Sets of Packages and add the packages for Visual Machine Learning (scikit-learn, XGBoost).

    Note

    This set of packages differs slightly depending on the version of Dataiku.

  4. Paste the list of additional packages in the Requested packages (Pip) code field (below the auto-populated ones for visual machine learning):

    openpyxl
    matplotlib
    tabula-py
    altair
    bokeh
    dash
    nbformat==4.2.0
    plotly==4.14.3
    requests
    datetime
    
  5. Click Save and Update.

Use code environments#

Once a code environment is created, many Dataiku objects can be configured to use this specific code environment.

Set a project-level code environment#

By default, projects inherit the code environment according to the global settings of the instance. You can check this in Administration > Settings > Misc. Unless otherwise specified, this is the DSS built-in environment.

In an individual project, you can set a different code environment to be used when processing code within that project.

  1. Navigate to the Code Environments project homepage.

  2. From the top navigation bar, go to the menu More Options (”…”) and click on Settings.

  3. In the left panel, select Code env selection.

  4. Under Default Python code env, change the mode to Select an environment. A dropdown will appear that allows you to select a different environment from those already created.

  5. Select the code environment that you created in the previous step and click Save.

Dataiku screenshot of the project settings page to select a code environment.

Set a code environment in a recipe#

By default, Python and R recipes use the project’s code environment. For each recipe, you can set a different code environment to be used when processing code within that recipe.

  1. From the Flow, double-click on the Python code recipe recipe_from_notebook_orders_analysis to open it.

  2. On the Advanced tab of a recipe, find the Python environment panel.

Notice that the code recipe has already inherited the environment you set at the project level.

You could change this if you need to use a different code environment.

Dataiku screenshot of the advanced tab of a Python recipe to select a code environment.

Set a code environment in a notebook#

From the Code menu, navigate to the Notebooks page. There is one Python notebook in this project: orders analysis.

By default, Python and R notebooks use the project’s code environment, but if you have created the orders analysis notebook in one of the previous hands-on tutorials and the Jupyter kernel is still active, it would still be using the DSS builtin environment.

To set the code environment you created for the orders analysis notebook:

  1. Open the notebook.

  2. From the notebook tools bar, click on Kernel dropdown. Then, Change kernel and select the py36-developer-v9 environment you created from the list.

  3. Click Save.

Dataiku screenshot of a Python notebook choosing a kernel.

What’s next?#

In this tutorial, you have learned how to create code environments, add packages, and set the code environment in different Dataiku objects.

Note

It is also possible to set code environments in webapps and plugins. To learn more, see the reference documentation on Python and R code environments.