Tutorial | Static insights#

In this tutorial, you will practice creating and publishing custom visualizations as static insights in Dataiku.

Conceptual understanding#

Static insights are data files generated with code in Dataiku that allow you to save and publish on a dashboard any custom figure. You can use them to build and share custom visualizations with libraries such as dash, bokeh, plotly, and ggplot2, depending on your needs and choice of code language.

There are several Dataiku functions that allow you to save your custom visualizations as static insights. They vary based on the code library you are using:

  • insights.save_bokeh for bokeh.

  • insights.save_plotly for plotly.

  • dkuSaveGgplotInsight for ggplot2, etc.

You can also use the generic insights.save_data for any other visualization library. This function can be used to save your visualization as an HTML object (maintaining its interactivity) or as an image.

Get started#

In this tutorial, you will create static insights with the Plotly, ggplot2, and Altair libraries. The latter will require the insights.save_data() function.

Prerequisites#

  • Some familiarity with Python and/or R.

  • Basic knowledge of working with code notebooks in Dataiku.

Technical requirements#

  • A Python code environment with the following packages installed (in addition to the mandatory Dataiku packages).

    plotly==4.14.3
    nbformat==4.2.0
    altair==4.0.1
    

    Note

    If you are using Dataiku Cloud, you can use the “dash” code environment. Otherwise, you can follow the instructions in this article to create a code environment compatible with all courses in the Developer learning path.

    This tutorial was tested using a Python 3.6 code environment. Other Python versions may be compatible.

  • An R code environment with the ggplot2 package installed (note that Dataiku Cloud cannot be used for creating a ggplot2 static insight).

Tip

You can use the Dataiku builtin R environment, which has the ggplot2 package.

Create your project#

To get started, you need to create or access a starter project in one of the following ways:

Import a starter project#

From the Dataiku homepage, click +New Project > DSS tutorials > Developer > Visualization.

Note

You can also download the starter project from this website and import it as a zip file.

Continue from the previous course#

If you are following the Academy Visualization course and have already completed one or more of the previous tutorials, you can continue working in the same project you created earlier.

Download and import dataset into new project#

Alternatively, you can download the gas_prices dataset and import it into a new project.

Create a plotly static insight#

In this exercise, we will create a simple scatter plot using sample data that’s built into the Plotly Express module. It contains information about iris flower species.

  1. To get started, navigate to the Code menu and select Notebooks.

  2. Create a new empty Python notebook and name it plotly-insight.

    The newly created notebook contains some starter code, which you can modify to import the required packages and modules.

  3. Replace the starter code in the second cell with the following:

    import dataiku
    from dataiku import insights
    import plotly.express as px
    
  4. Next, replace the starter code in the third cell in order to read in a sample Plotly Express dataset, convert it to a dataframe, and create and display a simple scatter plot figure.

    df = px.data.iris()
    fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species")
    fig.show()
    
  5. Run the first three notebook cells in order.

    The newly created Plotly figure should now be displayed inline.

    ../../_images/plotly-fig-inline.png
  6. Finally, enter the following code in the fourth and last cell in order to save the figure as a static insight.

    # fig is a plot.ly figure, or any object that can be passed to iplot()
    insights.save_plotly("plotly-insight", fig)
    
  7. Run the cell and save your progress.

You can now find your newly saved plotly-insight in the Insights menu, and publish it on a dashboard. To do so:

  1. Navigate to the Insights menu and click on “plotly-insight” to view the static insight.

  2. From the Actions menu in the right sidebar, click Add Insight.

  3. Select the dashboard and slide you wish to publish your insight on, and click Pin.

  4. Navigate to the Dashboards menu and open your selected dashboard.

../../_images/plotly-published.png

You will now see the published static insight appear on the dashboard. You can optionally edit its name, size, and position from the Edit tab.

Create a ggplot2 static insight#

Next, following a very similar process, we will create, save, and publish a ggplot2 figure. We will use publicly available fuel economy data, which is also built into the ggplot2 library as a dataset named mpg.

  1. Return to the Notebooks page.

  2. Create a new empty R notebook and name it ggplot2-insight.

    Like Python notebooks, newly created R notebooks also contain sample code to help you get started.

  3. In the first code cell, add the following line of code below library(dataiku).

    library(ggplot2)
    
  4. Replace the sample code in the second cell with the following:

    gg <- ggplot(mpg, aes(displ, hwy, colour = class)) +
    geom_point()
    
  5. Enter the function below into the third cell to save your figure as a static insight.

    dkuSaveGgplotInsight("ggplot2-insight", gg)
    
  6. Run each code cell in order and save your progress.

../../_images/ggplot2-notebook.png

Just like the Plotly insight, you can now find the ggplot2-insight in the Insights menu, and publish it on a dashboard. To do so:

  1. Navigate to the Insights menu and click on ggplot2-insight to view the static insight.

  2. From the Actions menu in the right sidebar, click Add Insight.

  3. Select the dashboard and slide you wish to publish your insight on, and click Pin.

You can now navigate to the Dashboards menu to view and edit the published insight.

../../_images/ggplot2-published.png

Create a static insight with another library#

If you want to create static insights with libraries that don’t have their own dedicated “save” function, you will have to use another function, insights.save_data() which should include a payload being your HTML object.

In the following example, we will use the Altair library, which is generally used for interactive time series visualizations.

Note

Make sure you are using a code environment set up with the altair==4.0.1 package installed.

  1. Return to the Notebooks page.

  2. Create a new empty Python notebook and name it altair-insight.

Import libraries and modules#

Similarly to the previous two examples, the notebook contains starter code.

Add the following import functions in the first cell, after the ones that already appear in the starter code.

import dataiku.insights
import base64
# requires a code env with altair 4.0.1
import altair as alt

The base64 library will make it possible to encode the Altair figure.

Set up the data#

Next, replace the sample code in the second cell with the following, in order to load and transform the data.

# load DSS datasets as Pandas dataframes
gas_prices = dataiku.Dataset("gas_prices")
df_prices = gas_prices.get_dataframe()

# aggregating our dataset at the national level (vs. state level)
df_prices = df_prices.groupby(['date_start','kind_of_product'])['mean_distribution_price'].mean().reset_index()
# removing a product
df_prices = df_prices.query('kind_of_product != "GLP_R$/13Kg"')
df_prices["date_start"] = pd.to_datetime(df_prices["date_start"])

Build the visualization#

Enter the following code into the next cell in order to build the Altair figure.

#### Building our figure with the Altair library
# based on this example:
# https://altair-viz.github.io/gallery/multiline_tooltip.html

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['date_start'], empty='none')

# The basic line
line = alt.Chart(df_prices).mark_line().encode(
    x='date_start',
    y='mean_distribution_price:Q',
    color='kind_of_product:N'
)

# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(df_prices).mark_point().encode(
    x='date_start',
    opacity=alt.value(0),
).add_selection(nearest)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

# Draw text labels near the points, and highlight based on selection
text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'mean_distribution_price:Q', alt.value(' '))
)

# Draw a rule at the location of the selection
rules = alt.Chart(df_prices).mark_rule(color='gray').encode(
    x='date_start',
).transform_filter(nearest)

# Put the five layers into a chart and bind the data
chart_prices = alt.layer(
    line, selectors, points, rules, text
).properties(
    width=1000, height=800,title='Evolution of the Median Distribution Prices per product in Brazil'
)

Export the figure as a static insight#

The last step is to save our figure as a static insight. Contrary to the previous function, we have to use the insights.save_data() function here and add the figure as the payload of the function. The payload should be encoded in base64, which is why we imported the base64 library.

  1. Add a new code cell and enter the following code:

    # Export chart as a Dataiku Insight to display in the Dashboard
    chart_prices_html = chart_prices.to_html() # char_prices being the fig
    
    chart_prices_insight = base64.b64encode(chart_prices_html.encode("utf-8"))
    
    dataiku.insights.save_data('chart_prices', payload=chart_prices_insight, content_type= 'text/html' , label=None, encoding='base64')
    
  2. Run each code cell in order and save your progress.

../../_images/altair-notebook.png

You can now find the chart_prices static insight in the Insights menu, and publish it on a dashboard. To do so:

  1. Navigate to the Insights menu and open the chart_prices insight.

  2. From the Actions menu in the right sidebar, click Add Insight.

  3. Select the dashboard and slide you wish to publish your insight on, and click Pin.

You can now navigate to the Dashboards menu to view and edit the published insight.

../../_images/altair-published.png

What’s next?#

Using static insights in Dataiku, you have created, saved, and published custom visualizations created with the Plotly, ggplot2, and Altair libraries. To learn more about creating static insights with other Python and R libraries, visit: