Tutorial | Code Environments (Developer part 3)¶
Dataiku code environments address the problem of managing dependencies and versions when writing code in Python and R. They are similar to the Python virtualenv or to renv (for R users).
Before beginning this tutorial, you may wish to review a code environment concept article.
In this tutorial, you will:
create a code environment;
set a code environment at the project, recipe and notebook level.
If you skipped the previous sections and just want to focus on code environments, you need to:
One additional prerequisite is the permission to create, modify, and use code environments. These permissions can be assigned by an administrator of your Dataiku instance.
If you don’t have the necessary permissions to create code environments on your instance, you can complete this tutorial using the free edition or a trial of Dataiku Cloud.
A Dataiku administrator, or any user in a group with the proper permissions, can create different code environments.
There are two main ways to create a code environment:
Creating a new Python or R environment from scratch and then importing various packages into it;
Importing a previously created Python or R environment into a Dataiku instance, and then optionally configuring it further.
In this exercise, we will create a single code environment with all the required packages for the hands-on tutorials of the Developer learning path.
From the Dataiku homepage, click the Applications menu in the top navigation bar. Select Administration > Code Envs.
You can choose to either create a new environment from scratch or import one from a ZIP file. We will choose the former.
Once in the Code Envs tab, click New Python Env in the upper right corner.
Give it a descriptive name and select a Python version. We’ll be using
py36-developer-v9to indicate the Python and DSS versions.
Here we are creating a Python 3.6 environment. As explained in the documentation, you’ll need the requested version of Python to be installed on your system in order to create that kind of code environment.
For the purposes of this tutorial alone, you are welcome to create a code environment for another version of Python if you do not have Python 3.6 on your system. However, all of the courses in the Developer learning path were tested with Python 3.6 code environments.
Leave the default deployment type, Managed by DSS (recommended).
Leave the default settings for Conda, Mandatory packages, and Jupyter, and click Create.
Once the environment has been created, open it, and go to Currently installed packages. Notice that the mandatory packages are already installed.
Go to Packages to install.
This is where you can install additional packages.
Under Requested packages, click on Add Sets of Packages.
Add the packages for “Visual Machine Learning (scikit-learn, XGBoost)”.
This set of packages differs slightly depending on the version of Dataiku.
Then paste the list of additional packages in the Requested packages (Pip) code field (below the auto-populated ones for visual machine learning):
openpyxl matplotlib tabula-py altair bokeh dash nbformat==4.2.0 plotly==4.14.3 requests datetime
Click Save and Update.
Once a code environment is created, many Dataiku objects can be configured to use this specific code environment.
Set a project-level code environment¶
By default, projects inherit the code environment according to the global settings of the instance (you can check this in “Administration > Settings > Misc.”). Unless otherwise specified, this is the DSS builtin environment.
In an individual project, you can set a different code environment to be used when processing code within that project.
Navigate to the Code in Dataiku project homepage.
From the top navigation bar, go to the menu More Options (“…”) > Settings > Code env selection.
For the default Python code env:
Change the mode to Select an environment.
A dropdown appears that allows you to select a different environment from those already created.
Select the code environment that you created in the previous step.
Set a code environment in a recipe¶
By default, Python and R recipes use the project’s code environment. For each recipe, you can set a different code environment to be used when processing code within that recipe.
From the Flow, open the Python code recipe.
On the Advanced tab of a recipe, find the Python environment panel.
Notice that the code recipe has already inherited the environment you set at the project level.
You could change this if you need to use a different code environment.
Set a code environment in a notebook¶
From the Code menu, navigate to the Notebooks page. There is one Python notebook in this project, orders analysis.
By default, Python and R notebooks use the project’s code environment, but if you have created the “orders analysis” notebook in one of the previous hands-on tutorials and the Jupyter kernel is still active, it would still be using the DSS builtin environment.
To set the code environment you created for the orders analysis notebook:
Open the notebook.
Click Kernel > Change kernel and select the environment you created from the list.
In this tutorial, you have learned how to create code environments, add packages, and set the code environment in different Dataiku objects.
It is also possible to set code environments in webapps and plugins. To learn more, see the reference documentation on Python and R code environments.