Using Jupyter Notebooks in DSS

Jupyter notebooks are a favorite tool of many data scientists. They provide users with an ideal environment for interactively analyzing datasets directly from a web browser, combining code, graphical output, and rich content in a single place.

Given their usefulness for doing data science, Jupyter notebooks are natively embedded in Dataiku DSS, and tightly integrated with other components.

Creating a Jupyter Notebook

Depending on your objectives, you can create a Jupyter notebook in DSS in a number of different ways:

  • In order to create a blank notebook, navigate to the Notebook section from the Code menu (shortcut G+N). Click + New Notebook. You will then have the choice of creating a code notebook for a variety of languages.

"Creating a new notebook"

At this point, you can start a Jupyter notebook from a Python, R, or Scala kernel in the code environment of your choice. You will also be asked to choose a starter template. For example, will you be reading in a dataset from memory or using Spark?

  • A second option simplifies reading in the dataset of interest using the Dataiku API. From the Flow, select the dataset and enter the Lab. Create a new code notebook.

"Creating a new notebook from within the Flow"

The starter code of a notebook created in this manner will have already read in the chosen dataset to a df variable, whether it may be a Pandas, R, or Scala dataframe.

"New notebook with minimal pre-filled code"
  • One last option is similar to the Lab route. From the Flow, select a dataset and create a Python, R, or Scala code recipe. You can then select the Edit in Notebook option. This will take you into a Jupyter notebook where you can interactively workshop the recipe before saving it back into the Flow.

../../../_images/notebook-from-recipe.png

Pre-defined Notebook Templates

Another useful feature of Jupyter notebooks in DSS is pre-defined code notebooks to kickstart common kinds of statistical analyses, such as dimensionality reduction, time series, or topic modeling. You can run these notebooks as given, or modify them to go deeper into an analysis.

Create one by entering the Lab and choosing a pre-defined option instead of a new one.

../../../_images/pre-defined-notebooks.gif

You can also create your own notebook templates through the plugin system.

For more information about pre-defined notebooks, please see the reference documentation.

Sharing Output from Jupyter Notebooks

For a collaborative platform like DSS, the ability to share work and analyses is of high importance. DSS allows you to save static exports of Jupyter notebooks in an HTML format, which can be shared on dashboards.

To share a notebook on a dashboard, simply click Publish from the Actions menu of the notebook and indicate the dashboard and slide where it should appear. This also adds the notebook to the list of saved Insights.

../../../_images/publish-notebook.png

Alternatively, you can first create the Insight before publishing it on a dashboard. Navigate to the Insights menu (G+I) and click +New Insight. Choose notebook from the available options.

../../../_images/notebook-insight.png

To learn more about sharing Jupyter notebooks as insights, please see the reference documentation.

Generating a Notebook from a Model

Finally, another interesting feature is the ability to create a Jupyter notebook directly from a trained machine learning model.

For explanatory purposes, you can export similar versions of models trained using the in-memory Python engine to a Jupyter notebook. You can access this feature from the caret menu next to the Deploy button.

"Model action menu: export to notebook"

For more information, please consult the reference documentation.

What’s Next?

Jupyter notebooks are first-class citizens in DSS. They are in the toolbox of most of the data scientists, and they make a great environment for interactively analyzing your datasets using Python, R, or Scala.

To learn more about notebooks in DSS, including SQL notebooks, please see the reference documentation on code notebooks.