Concept: Code Environments in Dataiku¶
Code environments address the problem of managing dependencies and versions of software libraries when writing code.
A Dataiku code environment is a standalone and self-contained environment to run Python or R code. It is similar to the Python virtualenv or to renv (for R users).
Create a Code Environment¶
Dataiku enables you to create a new environment from scratch or import your own pre-built environment. The administrator of your Dataiku instance can assign the needed permissions to create, modify, and use code environments.
For a practical exercise of creating and configuring code environments, check out this tutorial.
Create a New Python or R Environment¶
To create a new code environment, you need to navigate to Administration > Code Envs, and then click New Python Env or New R Env.
You then need to select the deployment type. There are several options:
deploying it as a managed environment by Dataiku;
as a non-managed path; or
as a named external Conda environment.
To ensure a smooth deployment and optimal usage, it is a best practice to deploy it as a managed environment by Dataiku.
When creating a new Python code environment, you also need to select the Python version among those that are supported by Dataiku. You can select versions that are in PATH or use Conda.
By default, Dataiku will install the mandatory sets of Dataiku packages, as well as Jupyter notebook support packages. It’s recommended to leave these settings on, as you wouldn’t be able to use the Dataiku APIs or Jupyter Notebooks without them.
Finally, the name of your code environment should be descriptive and must be globally unique to the Dataiku instance. When working with Python environments, it’s a best practice to indicate the Python version in the name.
Import a Python or R Environment¶
Dataiku also allows you to import your own pre-built code environment by selecting a file on your computer. To do this, you need to select Import Env from the Code Envs page.
If you decide to import an environment, make sure it has all of the mandatory Dataiku packages.
Manage Code Environments¶
Once you have created your code environment, it will appear on the Code Envs page, where you will be able to navigate all of the environments in your Dataiku instance to which you have access, and get quick information about their names, languages, owners, deployment type, and whether they are attached to Jupyter kernels.
Each code environment has its own set of packages. Environments in Dataiku are independent: you can install different sets of packages, or different versions of packages, in different environments without any interaction between them. To install packages to your code environment navigate to Packages to Install.
Here, you will find a list of Base Packages, which correspond to the mandatory and recommended packages that you selected to install when creating the environment. These packages are required by your current settings. Therefore, they cannot be removed, and you cannot modify their version constraints.
In the Requested packages section below, you can type in the packages you wish to install along with their versions line by line, as you would for a requirements.txt file.
When using the Requested packages field in Python code environments, packages are installed through pip. For packages that are not available through pip, you need to download the source code on the DSS server and add the path of the source file to the Requested packages field.
If using a Python code environment for visual machine learning or deep learning, Dataiku can automatically populate the Requested packages field with the required packages for your use case.
When finished, click Save and Update. You can see the list of packages that have been successfully installed and their versions in the Installed Packages panel.
Set Code Environments in Dataiku Objects¶
Once you have created a code environment and installed the packages you need, you can configure different Dataiku objects to use the environment of your choice.
You can do this for:
Set a Code Environment in a Project¶
By default, Dataiku projects inherit the code environment specified in the global settings of the Dataiku instance. If no default environment has been specified in the global settings, then this would be the DSS builtin environment.
If you want to change the default Python or R environment to another one that will be used across a given project, you can modify the project-level settings. You can do this from the project Settings menu, by opening the Code Env Selection panel.
Code notebooks, recipes, and webapps are initially set up to inherit the default project-level code environment. If you need to use different environments for different objects, you can also change the environment on the individual object level.
Set a Code Environment in a Code Notebook¶
To change the code environment of a code notebook, you need to change the Kernel.
Set a Code Environment in a Code Recipe¶
You can change the code environment in a code recipe from the Advanced tab.
Set a Code Environment in a Webapp¶
For webapps, you need to enable the backend first, after which you will be able to change the code environment from the Settings tab.
Set a Code Environment in a Plugin¶
By default, plugins will use the DSS built-in code environment. However, it is good practice to create a dedicated code environment for each plugin. That way, the plugin can be used on other Dataiku instances.
This article introduced the concept of code environments in Dataiku, how to create, import and manage code environments, and how to set them in various Dataiku objects. To learn more:
read the product documentation on Python and R code environments;
follow this hands-on tutorial on creating and setting code environments.