Hands On: Bokeh Webapp¶
In this tutorial, we’ll create a simple Bokeh webapp in Dataiku DSS. It’s a scatterplot on sales data from the fictional Haiku T-shirt company, related to the data used in the Basics tutorials.
Let’s Get Started!¶
Some familiarity with Python.
A Python code environment with the
bokehpackage and mandatory Dataiku packages installed.
The default DSS builtin environment has the
bokeh package and all mandatory Dataiku packages pre-installed. Therefore, it’s recommended to stick with the default settings and use the DSS builtin environment (which is selected by default in a newly created Dataiku DSS instance) for the purpose of this tutorial.
Create Your Project¶
Create your project by selecting one of these options:
Import a Starter Project¶
From the Dataiku homepage, click +New Project > DSS Tutorials > General Topics > Haiku Starter.
Create a New Bokeh Webapp¶
To create a new empty Bokeh webapp:
In the top navigation bar, select Webapps from the Code (</>) menu.
Click + New Webapp.
Click Code Webapp.
From the displayed options, select An empty Bokeh app
Type a simple name for your webapp, such as
bokeh webapp, and click Create.
You will land on the View tab of the webapp, which is empty for the moment, as we haven’t started creating the webapp yet.
Navigate to the Edit tab of the webapp.
Explore the Webapp Editor¶
The webapp editor is divided into two panes:
The left pane allows you to see and edit the Python code underlying the webapp.
The right pane gives you several views on the webapp.
The Preview tab allows you to write and test your code in the left pane while having immediate visual feedback in the right pane. At any time you can save or reload your current code by clicking on the Save button or the Reload Preview button.
The Python tab allows you to look at different portions of the code side-by-side in the left and right panes.
The Log is useful for troubleshooting problems.
Settings tab allows you to set the code environment for this webapp, if you want it to be different from the project default.
In the Settings Tab, under Code env, you can see the code environment that the webapp is using. It’s currently set to inherit the project default environment, which is the DSS builtin environment.
As the builtin environment already contains the necessary packages for this tutorial, you do not need to change this. However, if you wish to use another code environment for this webapp or for another one, you can change it from the Code env dropdown menu, and then click Save.
Learn more about code environments in Dataiku DSS here.
Set Up the Webapp¶
Let’s build the code behind the Python Bokeh webapp.
Insert the following code into the Python tab so that we’ll have the necessary tools to create the webapp.
from bokeh.io import curdoc from bokeh.layouts import row, widgetbox from bokeh.models import ColumnDataSource from bokeh.models.widgets import Slider, TextInput, Select from bokeh.plotting import figure import dataiku import pandas as pd
Set Up the Data¶
Add the following code to the Python tab to parameterize the inputs to the webapp:
# Parameterize webapp inputs input_dataset = "Orders_enriched_prepared" x_column = "age" y_column = "total" time_column = "order_date_year" cat_column = "tshirt_category"
By specifying this information up front, it will be easier for us to generalize the webapp later.
Next, add the following code to the Python tab to access a Dataiku dataset as a pandas dataframe.
# Set up data mydataset = dataiku.Dataset(input_dataset) df = mydataset.get_dataframe()
Next, we’ll add code which extracts the customer age and total amount spent from the pandas dataframe to define the source data for the visualization as a
Add the following code to the Python tab:
x = df[x_column] y = df[y_column] source = ColumnDataSource(data=dict(x=x, y=y))
Click Save in the upper right corner of the page.
Nothing is displayed yet because we haven’t created the visualization, but there are no errors in the log.
Define the Visualization¶
Add the following code to the Python tab to define the output visualization:
# Set up plot plot = figure(plot_height=400, plot_width=400, title=y_column+" by "+x_column, tools="crosshair,pan,reset,save,wheel_zoom", x_range=[min(x), max(x)], y_range=[min(y),max(y)]) plot.scatter('x', 'y', source=source)
- With this code:
first, we have created a
plotobject with the desired height and width properties;
we’ve defined the title of the plot using the X- and Y-Axis column names;
we have also computed the minimum and maximum values of customer age and total, and used those to define the axis limits;
and finally, we have defined the visualization as a scatterplot that plots data from the
Next, we’ll add code which defines the layout of the webapp and adds it to the current “document”. For now, we’ll include an empty widgetbox that we’ll populate in a moment when we add the interactivity.
Add the following code to the Python tab to define the layout of the app:
# Set up layouts and add to document inputs = widgetbox() curdoc().add_root(row(inputs, plot, width=800)) * Click **Save**.
The preview should now show the current (non-interactive) scatterplot.
The current scatterplot includes all orders from 2013-2017, across all types of t-shirts sold. Now, let’s add the ability to select a subset of years, and a specific category of t-shirt. To do this, we need to make changes to the Python code.
The code in this section should be added after the code which sets up the plot, but before the code which defines the layout of the webapp.
In the Python tab, after the
# Set up plot block of code, and before the
# Set up layouts and add to document block, add the following:
# Set up widgets text = TextInput(title="Title", value=y_column+" by "+x_column) time = df[time_column] min_year = Slider(title="Time start", value=min(time), start=min(time), end=max(time), step=1) max_year = Slider(title="Time max", value=max(time), start=min(time), end=max(time), step=1) cat_categories = df[cat_column].unique().tolist() cat_categories.insert(0,'All') category = Select(title="Category", value="All", options=cat_categories)
This defines four widgets:
textaccepts text input to be used as the title of the visualization.
max_yearare sliders that take values from 2013 to 2017 in integer steps. Their default values are 2013 and 2017, respectively.
categoryis a selection that has an option for each t-shirt category, plus “All”. Its default value is “Al”l.
Set Up Update Functions and Callbacks¶
Next, we’ll add the instructions on how to update the webapp when a user interacts with it.
# Set up widgetscode block, add the following code to the Python tab:
#Set up update functions and callbacks def update_title(attrname, old, new): plot.title.text = text.value def update_data(attrname, old, new): category_value = category.value selected = df[(time>=min_year.value) & (time<=max_year.value)] if (category_value != "All"): selected = selected[selected[cat_column].str.contains(category_value)==True] # Generate the new plot x = selected[x_column] y = selected[y_column] source.data = dict(x=x, y=y)
When the title text is changed,
plot.title.textto the new value.
When the sliders or the select widget are changed,
update_datatakes the input dataframe
dfand uses the widget selections to filter the dataframe to only use records with the correct order year and t-shirt category. It then defines the X and Y axes of the scatterplot to be the age and order total from the filtered dataframe.
Next, we’ll add the two following pieces of code that listen for changes to the widget values using the
on_change() method, which calls the functions above to update the webapp:
Add the following code right after the ``Set up update functions and callbacks`` comment and right before the ``update_title()`` function:
Next, add the following code **after the last line of the
update_data()function (right after
source.data = dict(x=x, y=y)):
for w in [min_year, max_year, category]: w.on_change('value', update_data)
Finally, change the definition of
inputs as follows, to include the four widgets so that they are displayed in the webapp.
Right under the
# Set up layouts and add to documentcomment, replace the
inputs = widgetbox()line of code with the following:
inputs = widgetbox(text, min_year, max_year, category)
Click Save and then Reload Preview.
After you’ve saved your work and refreshed the preview, the Preview tab should now show the interactive scatterplot we just created.
Publish the Webapp to a Dashboard¶
When you are done with editing, you can easily publish your webapp on a dashboard.
Click Actions in the top-right corner of the screen.
From the Actions menu, click Publish.
Select Analytic Dashboard as the dashboard to publish your webapp on, and Slide 3 as the slide.
You are navigated to the Edit tab of the Analytic Dashboard.
In the Edit tab of a dashboard, you can edit the way your webapp appears, or add other webapps as well as other types of insights.
Optionally, you can drag and resize your webapp, or change its title and display options from the Tile sidebar menu. Click Save when you’re done.
Click View to navigate to the View tab and see how your webapp is displayed on the dashboard.
Using Dataiku DSS, you have created an interactive Bokeh webapp and published it to a dashboard.
To go further, you can: