Work Environment

Learn how to set your environment for coding with Dataiku — whether you prefer the native code environments, an IDE integration, or Code Studios.

How-to | Set a code environment

This how-to describes how to set Python or R code environments for plugins, projects, and recipes (or other objects within a project).

Note

Your Dataiku administrator, or any user in a group with the proper permissions, can create different code environments , which are then available for you to use.

Set a project-level code environment

By default, projects inherit the code environment according to the global settings of the instance (Administration > Settings > Misc.). Unless otherwise specified, this is the Dataiku builtin environment.

  • From the top navigation bar, go to … > Settings > Code env selection.

For the default Python or R code env:

  • Change the mode to Select an environment.

  • A dropdown appears that allows you to select a different environment from those already created.

../../_images/code-env-project.png

If there is an environment you expect to see that is missing, contact your administrator. They may need to create a new code environment or give you permission to use an existing one.

Tip

If you plan to use visual machine learning, the project-level code environment must include the scipy, scikit-learn, jinja2 and xgboost packages.

Set a recipe’s code environment

By default, Python and R recipes use the project’s code environment. For each recipe, you can set a different code environment to be used when processing code within that recipe.

  • On the Advanced tab of a recipe, find the Python or R environment panel.

  • Adjust the “Selection behavior” to select an environment as needed.

../../_images/code-env-recipe.png

Set a notebook’s code environment

By default, Python and R notebooks use the project’s code environment. For each notebook, you can set a different code environment to be used when processing code within that notebook.

You can set the code environment at notebook creation time, or by changing the kernel (Kernel > Change kernel from the notebook menus).

../../_images/code-env-notebook.png

Set a webapp or R markdown report’s code environment

By default, Python Bokeh webapps, R Shiny webapps, and R Markdown reports use the project’s code environment. For each of these objects, you can set a different code environment to be used when processing code within the object.

While in Edit mode, on the Settings tab, you can set the code environment using the Code env dropdown.

../../_images/code-env-webapp.png

Set a plugin’s code environment

The plugin developer defines the code environment specification as part of the plugin. After installing a plugin that contains a code environment definition, you are prompted to create a code environment for the plugin.

Set a visual model’s code environment

Dataiku Visual Machine Learning allows you to create custom models using Python, in addition to the built-in models. The Python code environment to be used for training those custom models can be set in the Runtime environment panel of the Design tab.

../../_images/code-env-ml.png

Set a code environment in other Dataiku objects

There are many other places in Dataiku where you can use custom code! For example, you can insert Python code in custom scenario steps and triggers, dataset metrics and checks, as well as custom models.

For Dataiku objects that are not focused on code, but accept custom code, the dropdown list is typically placed near the custom code.

../../_images/code-env-scenario.png

How-to | Edit Dataiku projects and plugins in VS Code

Goal

Though Jupyter notebooks are integrated into the Dataiku interface, some developers favor Visual Studio Code (VS Code) as an IDE. From within VS Code, you can:

  • Install the Dataiku DSS extension.

  • Configure VS Code to connect to an existing Dataiku instance.

  • Pull code from an existing code recipe, plugin or library into VS Code.

  • Edit the code in VS Code.

  • Locally run and debug code, and then

  • Save the code back to the code recipe, plugin, or library.

Note

This integration allows you to edit existing recipes and plugins on your Dataiku instance but does not allow you to create new recipes or plugins. You can, however, create new files and folders within existing plugins and libraries.

Prerequisites

  • Familiarity with code recipes or plugins in Dataiku.

Technical requirements

Install the Dataiku extension

  • In VS Code, open the Extensions panel.

  • Search for dataiku in the marketplace and select Dataiku DSS.

The Dataiku extension provides the ability to connect to a Dataiku instance and edit recipes and plugins on the instance.

Connect to a Dataiku instance

Define your connection in the ~/.dataiku/config.json file. The file should have form:

{
        "dss_instances": {
                "designInstance": {
                                 "url": "https://www.mydesigninstance.com:11500",
                                 "api_key": "your-design-API-key-secret"
                },
                "productionInstance": {
                                 "url": "https://www.myproductioninstance.com:12500",
                                 "api_key": "your-production-API-key-secret"
                }
        },
        "default_instance": "designInstance"
}

This configuration file defines two instances, designInstance and productionInstance, with the following settings:

  • url. The URL of the Dataiku instance, without / at the end

  • api_key. The secret for your personal API key

designInstance is designated as the default instance to use; this is the instance VS Code will connect to. If you change the default instance, you’ll need to restart VS Code to pick up the changes.

Note

The ~/.dataiku/config.json file is used by all of Dataiku’s IDE integrations, so once you’ve set it up for one IDE, it’s easy to reuse for another.

Edit a Dataiku project or plugin

In VS Code, open the Dataiku DSS panel. You’ll see a list of Projects and Plugins.

Under Projects, you can see the lists of Recipes, Webapps, Wikis, and Libraries within each project. Navigate to, and then open, the file you want to edit.

Under Plugins, you can see the entire folder structure of each plugin.

Changes made in VS Code are synchronized to Dataiku when you save the file.

VS Code screenshot of a Dataiku Python recipe.

How-to | Edit Dataiku projects and plugins in PyCharm

Goal

Though Jupyter notebooks are integrated into the Dataiku interface, some developers favor PyCharm as an IDE. From within PyCharm, you can:

  • Install the dataiku plugin

  • Configure PyCharm to connect to an existing Dataiku instance

  • Pull code from an existing code recipe, plugin, or library into PyCharm

  • Edit the code in PyCharm

  • Locally run and debug code recipes, and then

  • Save the code back to the code recipe or plugin

Note

This integration allows you to edit existing recipes, plugins, and libraries on your Dataiku instance but does not allow you to create new recipes or plugins. You can, however, create new files and folders within existing plugins and libraries.

Prerequisites

  • Familiarity with code recipes or plugins in Dataiku.

Technical Requirements

Install the Dataiku Plugin

  • In PyCharm, open Preferences and navigate to the Plugins page.

  • Search for dataiku in the marketplace and select Dataiku DSS.

The Dataiku plugin provides commands for connecting to a Dataiku instance and editing recipes on the instance.

Connect to a Dataiku Instance

  • In PyCharm, open Preferences and navigate to the Dataiku Dataiku Settings page.

PyCharm screenshot of Dataiku plugin showing Dataiku Dataiku settings page.

The “Synchronization” settings specify whether PyCharm should automatically synchronize changes between the local code base and the code on the Dataiku instance.

The “Instances” settings specify the information necessary to connect to a Dataiku instance. The parameters available for each Dataiku instance are:

  • Display name. A descriptive name for the Dataiku instance that will be displayed in PyCharm

  • Base URL. The base URL of the Dataiku instance, without / at the end

  • Personal API key secret. The secret for your personal API key

Save the settings.

Edit a Dataiku Recipe, Plugin, or Library

Before opening a Dataiku project, you must first create a project in PyCharm. Go to File > New Project, confirm settings, then select Create.

Then, to open a Dataiku project, go to File > Open Dataiku DSS. In the dialog, select:

  • DSS instance. Choose from among the instances you’ve set up on the Dataiku DSS Settings page of PyCharm’s Preferences.

  • Type. Choose whether you want to edit a Recipe, Plugin, or Library.

If you choose Recipe, then on the Next screen, you will choose the Project and Python Recipe within that Project you want to edit. If you want to locally run and debug the recipe, you may need to click Install to install the Dataiku client library in your virtual environment. If you are using a stock or Conda installation, you must do it manually by following the Dataiku package installation instructions.

If you choose Plugin, then on the Next screen, you will choose the Plugin within that instance you want to edit. The entire folder structure of the Plugin is downloaded locally to PyCharm.

If you choose Library, then on the Next screen, you will choose the Library within that instance you want to edit.

Changes made in PyCharm are synchronized to Dataiku automatically on the schedule defined in the Dataiku DSS Settings page, or when you explicitly choose File > Synchronize with DSS.

PyCharm screenshot of a Dataiku Python recipe.

How-to | Edit Dataiku projects and plugins in Sublime

Goal

Though Jupyter notebooks are integrated into the Dataiku interface, some developers favor Sublime Text as an IDE. From within Sublime, you can:

  • Install the dataiku package

  • Configure Sublime to connect to an existing Dataiku instance

  • Pull code from an existing code recipe or plugin into Sublime

  • Edit the code in Sublime, and then

  • Save the code back to the code recipe or plugin

Note

This method can edit existing recipes and plugin files on the Dataiku instance, but cannot create new recipes or files.

Prerequisites

  • Familiarity with code recipes or plugins in Dataiku.

Technical requirements

Install the Dataiku package

  • In Sublime Text, open Tools > Command Palette.

  • Search for install and select Package Control: Install Package.

  • Search for dataiku and select Dataiku.

The Dataiku package provides commands for connecting to a Dataiku instance and editing recipes on the instance.

Connect to a Dataiku instance

  • Open the Sublime Text Command Palette

  • Search for dataiku and select Dataiku: Configure DSS instances

  • This opens Dataiku.sublime-settings. Insert JSON of the format shown below, using the information for your instance and personal API key secret.

  • Save the settings.

{
              "instances": [
                      {
                              "name": "My DSS Instance",
                              "base_url": "http://localhost:12000",
                              "api_key": "SaHZlgrDHi1AfAc14flWt8vIgUyyUy6V",
                              "list_of_project_keys_to_exclude": [],
                              "list_of_plugin_ids_to_exclude": [],
                              "keep_only_code_recipes": true
                      }
              ]
      }

The parameters available for each Dataiku instance:

  • name. A descriptive name for the Dataiku instance that will be displayed in Sublime Text

  • base_url. The base URL of the Dataiku instance, without / at the end

  • api_key. The secret for your personal API key

  • list_of_project_keys_to_exclude. An optional a list of project keys that are excluded when Sublime Text searches for code recipes

  • keep_only_code_recipes. Determines whether visual recipes are hidden in Sublime Text. It is highly recommended to keep this option set to true.

Edit a Dataiku recipe

  • Open the Sublime Text Command Palette

  • Search for dataiku and select Dataiku: Edit DSS recipes

  • Choose the recipe you want to edit from list

Changes made in Sublime are made to the recipe in Dataiku when you Save, overwriting any edits made in the meantime on the Dataiku instance.

Screenshot of a Dataiku recipe being edited from Sublime Text.

Edit a Dataiku plugin

  • Open the Sublime Text Command Palette

  • Search for dataiku and select Dataiku: Edit DSS plugins

  • Choose the plugin whose code you want to edit

  • Choose the file in the plugin that you want to edit

Changes made in Sublime are made to the recipe in Dataiku when you Save, overwriting any edits made in the meantime on the Dataiku instance.

Screenshot of a Dataiku plugin being edited from Sublime Text.

What’s next?

The DataikuSublimeText extension is open sourced; feel free to contribute.

How-to | Edit Dataiku recipes in RStudio

Goal

Though Jupyter notebooks are integrated into the Dataiku interface, many R developers use RStudio. From within RStudio, you can:

  • Install the dataiku package

  • Connect to an existing Dataiku instance

  • Pull code from an R code recipe into RStudio

  • Edit the code in RStudio, and then

  • Save the code back to the R recipe

Note

This method can edit existing recipes and plugin files on the Dataiku instance, but cannot create new recipes or files.

Prerequisites

  • You should work through the R in Dataiku tutorial, or otherwise have a project with an available R recipe.

Technical requirements

Install the Dataiku R package

Within RStudio, run the following code to install the dataiku package. Replace URL_Dataiku_instance with the URL to a Dataiku instance; for example, if the instance is using a secure protocol on localhost and listening on port 11500, then replace URL_Dataiku_instance with https://localhost:11500.

install.packages("<URL_Dataiku_instance>/public/packages/dataiku_current.tar.gz", repos=NULL)

Warning

The URL will begin with either http or https, depending on how the instance was set up and whether a secure protocol was used

To install the dataiku package, you may need to install some dependencies. Follow RStudio’s (Posit’s) instructions for managing R packages.

The dataiku package provides add-ins for connecting to a Dataiku instance and managing R recipes on the instance.

Connect to a Dataiku instance

  • From the Add-ins menu, choose Dataiku: Setup DSS instance. This opens a dialog where you manage connections to Dataiku instances.

  • Click +Connect to Another DSS Instance and fill out the following information:

    • Name. Give the connection a descriptive name.

    • URL. Provide the URL to your Dataiku instance; e.g. for the instance above–https://localhost:11500.

    • API Key This is the secret of your Personal API key.

  • Select this as your active connection.

  • Click Save.

Edit a Dataiku recipe

  • Open a new R Script.

  • From the Add-ins menu, choose Dataiku: download R recipe code.

  • Choose the project key of a project with an R recipe; for example, DKU_TUTORIAL_R if you completed the R and Dataiku course.

  • Choose the recipe you want to edit from list; for example, compute_orders_by_customer

  • Click Download.

Get recipe content dialog from the Dataiku add-in for RStudio

The code of the Dataiku R recipe is downloaded into the R Script in RStudio. You can run the code like any other R Script in RStudio. With the dataiku package installed, the dkuReadDataset() function causes RStudio to use the Dataiku API to pull the Dataiku dataset from the server and into a local R dataframe. Other dataiku package functions likewise use the Dataiku API.

The exception is that you cannot write from RStudio to a Dataiku dataset. You need to save any changes you’ve made to the code back to the Dataiku recipe, and then run the recipe within Dataiku.

Note

You cannot create a new Dataiku R recipe through RStudio using these add-ins; the R recipe must already exist.

Save changes to a recipe

Changes made in RStudio are made to the recipe in Dataiku when you:

  • Choose Dataiku: save R recipe code from the Add-ins menu.

  • Click Send to DSS.

This overwrites any edits made in the meantime on the Dataiku instance.

FAQ | Why should I use a code environment?

Dataiku code environments address the problem of managing dependencies and versions when writing code in R and Python. They are similar to the Python virtualenv.

Code environments provide a number of benefits, including:

  • Isolation. Two teams can work independently on different projects using different versions of Python (or R) and a set of libraries whose versions differ.

  • Reproducibility of results. When you create a project bundle or API service package and push it to production, Dataiku includes the specification for the project’s code environment, and then rebuilds the code environment according to that specification when you import the bundle into the Dataiku Automation node or the package into the Dataiku API node. In this way, environments are versioned on your production server and you can rollback your code to a previous version together with its code environment.