Tutorial | Code notebooks (Developer part 1)#

Jupyter notebooks are a favorite tool of many data scientists. They provide users with an ideal environment for interactively analyzing datasets directly from a web browser, combining code, graphical output, and rich content in a single place.

Note

Before beginning this tutorial, you may wish to review Concept | Code notebooks.

Objectives#

In this tutorial, you will:

  • Create, edit, publish, and unload Jupyter notebooks in Dataiku.

Note

This tutorial only covers the use of Jupyter notebooks in Dataiku. You can find a separate tutorial on SQL notebooks.

Prerequisites#

  • Some familiarity with the basics of Dataiku (we recommend having completed the Core Designer learning path).

  • Some familiarity with coding in Python and using Jupyter notebooks.

  • An instance of Dataiku version 8.0 or above (Dataiku Cloud can also be used).

  • The requested permissions for code execution.

Create the project#

The first step is to create a new Dataiku Project. You will work with a sample project containing data from the fictional Haiku T-Shirt company.

  1. From the homepage, click +New Project > DSS tutorials > Developer > Code Notebooks.

Note

You can also download the starter project from this website and import it as a zip file.

Dataiku screenshot of the starting Flow.

Create and use Jupyter notebooks#

Given their usefulness for data science, Jupyter notebooks are natively embedded in Dataiku and tightly integrated with other components, which makes them easy to use in various ways.

Create a Jupyter notebook#

Depending on your objectives, you can create a Jupyter notebook in Dataiku in a number of different ways. In this exercise, we will create a notebook from a dataset, which simplifies reading in the dataset of interest using the Dataiku API.

  1. From the Flow, select the orders dataset.

  2. In the right panel, in the Actions tab, click the Lab menu (with the microscope icon).

    Dataiku screenshot of dialog for creating a Python notebook.
  3. Under Code Notebooks dropdown, click New.

  4. From the notebook options, select Python.

  5. Name the notebook orders analysis and click Create, leaving the default option to read the dataset in memory using Pandas.

    Dataiku screenshot of dialog for creating a Python notebook.

Edit and run code in a notebook#

The newly created notebook contains some useful starter code:

  • The first cell uses the built-in magic commands to import the numpy and matplotlib packages.

  • The second cell imports other useful packages.

  • The third cell reads in the orders dataset and converts it to a Pandas dataframe.

  • The fourth cell contains a function that performs some basic analysis on the columns of the dataset.

Dataiku screenshot of starter code in a Python notebook.

Note

The starter code of a notebook created from a dataset will have already read in the chosen dataset to a df variable, whether it may be a Pandas, R, or Scala dataframe.

You can edit the starter code as well as write your own code in the same way you would outside of Dataiku.

In this very simple exercise, we will slightly modify the existing starter code:

  1. Delete limit=100000 from the second line of code in the third cell to remove the default dataset sampling. After removing it, the line of code should look like this:

    df = dataset_orders.get_dataframe()
    
  2. Type df.head() right under the one above. The code in the third cell should now look like this:

    # Read the dataset as a Pandas dataframe in memory
    # Note: here, we only read the first 100K rows. Other sampling options are available
    dataset_orders = dataiku.Dataset("orders")
    df = dataset_orders.get_dataframe()
    df.head()
    
  3. Run the first three cells to read in the orders dataset and display the head, or the first 5 rows of the dataset, by default.

    Dataiku screenshot of an edited Python notebook 1.
  4. Run the fourth and last cell (pdu.audit(df)), which is part of the starter code, to display some basic information about the columns of the orders dataset.

  5. Click the Save button (or use the shortcut Ctrl + S / Cmd + S for Mac) to save your progress.

    Dataiku screenshot of an edited Python notebook 2.

Note

It’s also possible to create Jupyter notebooks from machine learning models. For more information, consult the reference documentation.

Publish a Jupyter notebook to a dashboard#

For a collaborative platform like Dataiku, the ability to share work and analyses is of high importance. Dataiku allows you to save static exports (non-interactive snapshots) of Jupyter notebooks in an HTML format, which can be shared on dashboards.

To share the notebook on a dashboard:

  1. Click Publish from the Actions menu of the notebook and indicate the dashboard and slide where it should appear.

    Dataiku screenshot of the dialog for adding a notebook to a dashboard.

    By default, only the printed outputs of the notebook appear in the published insight.

  2. In the Tile tab of the dashboard’s Edit tab, select the Show code checkbox to display the code cells.

    Dataiku screenshot of a code notebook insight in a dashboard from the Edit tab.
  3. Save your changes, then navigate to the View tab to see how the notebook insight appears on the dashboard.

    Dataiku screenshot of a code notebook insight in a dashboard from the View tab.

Note

Publishing a static snapshot of a notebook to a dashboard also adds it to the list of saved insights. To learn more about sharing Jupyter notebooks as insights, see the reference documentation.

Unload a notebook#

Finally, once you’re done working in a Jupyter notebook for the time being, you can optimize its computational efficiency by killing the kernel. To do this:

  1. Navigate to the Notebooks page (G+N).

  2. Check the box to select the orders analysis notebook.

  3. In the right panel, in the Actions tab, click Unload to kill the kernel.

What’s next?#

Jupyter notebooks are first-class citizens in Dataiku. They are in the toolbox of most data scientists, and they make a great environment for interactively analyzing your datasets using Python, R, or Scala.

Often you’ll want to convert code notebooks into code recipes. Take that next step in this tutorial.

Note

To learn more about notebooks in Dataiku, you can: