Hands-On: Code Environments¶
Dataiku DSS code environments address the problem of managing dependencies and versions when writing code in R and Python. They are similar to the Python virtualenv or to renv (for R users).
This hands-on tutorial describes how to set a Python code environment for projects, recipes, and notebooks. The code environment created here can be subsequently used for all courses in the Developer learning path.
Let’s Get Started¶
Some familiarity with working with Dataiku DSS objects, such as projects, recipes, and code notebooks.
Access to a Dataiku DSS instance - version 8.0 or above (Dataiku Online can also be used);
Permissions to create, modify, and use code environments. These permissions can be assigned by an Administrator of your DSS instance.
If you don’t have the necessary permissions to create code environments on your instance, you can complete this tutorial using the free edition.
Create the Project¶
The first step is to create a Dataiku DSS Project. You can do this in one of the following ways:
Create a New Project¶
From the homepage, click +New Project > DSS Tutorials > Developer > Code Environments (Tutorial).
Continue From the Previous Hands-On Tutorial¶
If you are following the Academy “Code in Dataiku DSS” course and have already completed one of the hands-on lessons, you can begin this lesson by continuing with the same project you created earlier.
Create a Code Environment¶
If you are using Dataiku Online, you will not be able to create a new code environment. You can simply skip to the section “Use Code Environments” and use the “dash” code environment to complete the exercises.
A Dataiku DSS administrator, or any user in a group with the proper permissions, can create different code environments .
There are two main ways to create a code environment:
Creating a new Python or R environment from scratch and then importing various packages into it;
Importing a previously created Python or R environment into a Dataiku DSS instance, and then optionally configuring it further.
In this exercise, we will create a single code environment with all the required packages for the hands-on tutorials of the Developer learning path.
From the Dataiku homepage, click the Applications menu in the top navigation bar. Select Administration > Code Envs.
You can choose to either create a new environment from scratch or import one from a ZIP file. We will choose the former.
Once in the Code Envs tab, click New Python Env in the upper right corner.
Give it a descriptive name and select a Python version. We’ll be using
py36-developer-v9to indicate the Python and DSS versions.
Here we are creating a Python 3.6 environment. As explained in the documentation, you’ll need the requested version of Python to be installed on your system in order to create that kind of code environment.
For the purposes of this tutorial alone, you are welcome to create a code environment for another version of Python if you do not have Python 3.6 on your system. However, all of the courses in the Developer learning path were tested with Python 3.6 code environments.
Leave the default deployment type, Managed by DSS (recommended).
Leave the default settings for Conda, Mandatory packages, and Jupyter, and click Create.
Once the environment has been created, open it, and go to Installed packages. Notice that the mandatory packages are already installed.
Go to Packages to install.
This is where you can install additional packages.
Under Requested packages, click on Add Sets of Packages.
Add the packages for “Visual Machine Learning (scikit-learn, XGBoost)”.
This set of packages differs slightly for versions 8 and 9 of Dataiku DSS.
Then paste the list of additional packages in the Requested packages (Pip) code field (below the auto-populated ones for visual machine learning):
openpyxl matplotlib tabula-py altair bokeh dash nbformat==4.2.0 plotly==4.14.3 requests datetime
Click Save and Update.
Use Code Environments¶
Once a code environment is created, many Dataiku objects can be configured to use this specific code environment.
Set a Project-Level Code Environment¶
By default, projects inherit the code environment according to the global settings of the instance (you can check this in Administration > Settings > Misc.). Unless otherwise specified, this is the DSS builtin environment.
In an individual project, you can set a different code environment to be used when processing code within that project.
Navigate to the Code in Dataiku DSS (Tutorial) project homepage.
From the top navigation bar, go to the *More Options (“…”) menu > Settings > Code env selection.
For the default Python code env:
Change the mode to Select an environment.
A dropdown appears that allows you to select a different environment from those already created.
Select the code environment that you created in the previous step.
Set a Code Environment in a Recipe¶
By default, Python and R recipes use the project’s code environment. For each recipe, you can set a different code environment to be used when processing code within that recipe.
From the Flow, open the Python code recipe.
On the Advanced tab of a recipe, find the Python environment panel.
Notice that the code recipe has already inherited the environment you set at the project level.
You could change this if you need to use a different code environment.
Set a Code Environment in a Notebook¶
From the Code menu, navigate to the Notebooks page. There is one Python notebook in this project, orders analysis.
By default, Python and R notebooks use the project’s code environment, but if you have created the “orders analysis” notebook in one of the previous hands-on tutorials and the Jupyter kernel is still active, it would still be using the DSS builtin environment.
To set the code environment you created for the orders analysis notebook:
Open the notebook.
Click Kernel > Change kernel and select the environment you created from the list.
In this tutorial, you have learned how to create code environments, add packages, and set the code environment in different Dataiku DSS objects. It is also possible to set code environments in webapps and plugins. To learn more about this, see: