Hands-On Tutorial: Plugin Store

As you know, plugins allow users to extend the native features of Dataiku DSS. A plugin can contain one or more components, such as recipes, datasets, webapps, processors, and more.

Tip

The hands-on tutorial below on how to use a plugin is also found in the Dataiku Academy’s Plugin Store course, which is part of the Advanced Designer learning path. Register for the course if you’d like to track and validate your knowledge alongside concept videos, summaries, and quizzes.

Let’s Get Started!

In this hands-on lesson, you will use the Census USA plugin to enrich a dataset with socio-demographic variables from the US Census Bureau.

Prerequisites

This lesson assumes that you have basic knowledge of working with Dataiku DSS datasets and recipes.

Note

If not already on the Advanced Designer learning path, completing the Core Designer Certificate is recommended.

You’ll need access to an instance of Dataiku DSS (version 8.0 or above) with the following plugins installed:

These plugins are available through the Dataiku Plugin store, and you can find the instructions for installing plugins in the reference documentation. To check whether the plugins are already installed on your instance, go to the Installed tab in the Plugin Store to see a list of all installed plugins.

../../_images/adv-designer-plugins3.png

Note

If not already registered in the Academy Plugin Store course, we also recommend that you complete the following lessons beforehand:

Additional Note for Dataiku Online Users

Tip

Users of Dataiku Online should note that while plugin installation is not directly available, you can still explore available plugins from your launchpad:

  • From your instance launchpad, open the Features panel on the left hand side.

  • Click Add a Feature and choose “US Census” from the Extensions menu. (Reverse geocoding is already available by default).

  • You can see what plugins are already installed by searching for “installed plugins” in the DSS search bar.

Workflow Overview

The following figure shows the final Flow in Dataiku DSS.

The final Dataiku Flow for the plugin store hands-on tutorial highlighting the part using the Census USA plugin.

Create Your Project

We’ll start with the completed Flow from the hands-on project found in the Visual Recipes 102 course.

  • From the Dataiku homepage, click +New Project > DSS Tutorials > Advanced Designer > Plugin Store (Tutorial).

Note

You can also use a successfully completed project from the Visual Recipes 102 course.

Need Help Creating the Project?

Note

You can also download the starter project from this web site and import it as a zip file. You can find out more in this Academy FAQ.


Change Dataset Connections (Optional)

Aside from the input datasets, all of the others are empty managed filesystem datasets.

You are welcome to leave the storage connection of these datasets in place, but you can also use another storage system depending on the infrastructure available to you.

To use another connection, such as a SQL database, follow these steps:

  • Select the empty datasets from the Flow. (On a Mac, hold Shift to select multiple datasets).

  • Click Change connection in the “Other actions” section of the Actions sidebar.

  • Use the dropdown menu to select the new connection.

  • Click Save.

Note

For a dataset that is already built, changing to a new connection clears the dataset so that it would need to be rebuilt.

../../_images/adv-designer-change-connection.png

Note

Another way to select datasets is from the Datasets page (G+D). There are also programmatic ways of doing operations like this that you’ll learn about in the Developer learning path.

The screenshots below demonstrate using a PostgreSQL database.


  • Whether starting from an existing or fresh project, ensure that the datasets income_per_tract_usa_copy and merchants_by_state are built.

See Build Details Here if Necessary

  • From the Flow, select the end datasets required for this tutorial: income_per_tract_usa_copy and merchants_by_state

  • Choose Build from the Actions sidebar.

  • Leave the setting “Build required dependencies” in place.

  • Click Build to start the job, or click Preview to view the suggested job.

  • In the Jobs tab, you can see all the activities that Dataiku will perform.

  • Click Run, and observe how Dataiku progresses through the list of activities.

Inspect the Data

The merchants_by_state dataset contains a list of unique merchant IDs from Delaware state, the geographical coordinates (latitude and longitude) of the merchant location, and the merchant subsector description.

A Dataiku screenshot of the Explore tab of the merchants_by_state dataset showing a list of unique merchant IDs from Delaware.

Our first task is to determine the US census tract ID for each merchant location using this dataset. To do this, we will use the Get US census block group from lat lon recipe in the Census USA plugin.

Access the US Census Plugin

How you access the Census USA plugin will depend on which of its components that you choose to use.

The plugin consists of six components — three dataset connectors and three visual recipes.

A Dataiku screenshot of the Census USA page from the plugin store, highlighting the number of components it contains.

The dataset connectors from this plugin enable us to build and use the US Census data directly within Dataiku DSS. See the plugin page for more information.

Normally, to query the Census Bureau we would have to write code that uses their API to request data. A plugin recipe provides a graphical user interface (GUI) wrapper around this code.

We will use the Get US census block group from lat lon recipe to enrich our dataset. To access the recipe, click the +Recipe button from the Flow and select Census USA from the list. Alternatively, to access the recipe from a dataset,

  • Open or select the merchants_by_state dataset in the Flow.

  • Open the Actions sidebar.

  • Click Census USA from the “Plugin recipes” section to bring up a window containing the three recipes in the plugin.

  • Select the Get US census block group from lat lon recipe.

The dialog in Dataiku upon initiating a new recipe from the Census USA plugin, showing the three plugin recipe options.

Configure the Plugin Recipe

You can now configure the input and output of the Get US census block group from lat lon recipe by specifying the input dataset as merchants_by_state and creating a new dataset merchant_census_tracts as the output. Doing this opens up the Settings page of the recipe. To configure the settings,

  • Specify the value of “Column LATITUDE” as merchant_latitude and “Column LONGITUDE” as merchant_longitude.

  • Keep the value for “Benchmark” as Public_AR_Current to use the most recent snapshot of the US Census database, and “Vintage” as Current_Current to use the current address ranges as of the selected benchmark.

  • Specify the “API call throttle” as 0 to define the pause in seconds between each API call. A zero value is fine because the dataset is small, but you should adapt the value accordingly for larger datasets.

  • Select Use an id column as the value for “param_strategy.”

  • Finally, specify the “Input Column ID” to correspond to the unique IDs in the merchant_id column of the dataset.

The settings page in Dataiku for the "Get US census block group from lat lon" recipe from the Census USA plugin.

Run the recipe and explore the output dataset merchant_census_tracts. You can see that the dataset contains geographical information about the census tract ID and the state, county, and block codes.

The Explore tab of the merchant_census_tracts dataset highlighting the additional columns fetched from the US Census bureau.

Note

For your own awareness, in decreasing levels of specificity, the Census Bureau defines states, counties, census tracts, block groups, and finally census blocks.

For more details on the returned codes, see the Census Geocoder Documentation.

Now that we’ve determined the US census tract ID for each merchant’s location, our next task is to find the average household income of the census tracts for each merchant subsector. For this, we will perform a join of three datasets: merchant_census_tracts, merchants_by_state, and income_per_tract_usa_copy.

Join the Datasets

The income_per_tract_usa_copy dataset contains the average household income for all US census tracts. We will use a Join recipe to combine this dataset with the merchant_census_tracts and merchants_by_state datasets.

  • Select the merchant_census_tracts dataset from the Flow, and click the Join recipe from the Actions sidebar.

  • Select the additional input dataset merchants_by_state.

  • Name the output dataset merchants_with_tract_income.

In the Settings page of the Join recipe, go to the Join step. Here,

  • Click +Add Input and select income_per_tract_usa_copy as the “New input dataset” to be joined with the “Existing input dataset” merchant_census_tracts.

  • Click Add Dataset.

A Dataiku screenshot of the Join recipe settings left joining merchants_by_state and income_per_tract_usa_copy to merchant_census_tracts.

In the Selected columns step of the Join recipe,

  • Select only the “tract_id” column from the merchant_census_tracts dataset, and the “average_tract_income” column from the income_per_tract_usa_copy dataset.

  • From the merchants_by_state dataset, select all the columns.

  • Finally, run the recipe and explore the output dataset merchants_with_tract_income.

A Dataiku screenshot of the Explore tab of the merchants_with_tract_income dataset

You can also create a bar chart to display the average income for each merchant subsector.

The Charts tab of the dataset showing a bar chart displaying the census tracts’ average income for each merchant sector.

Learn More

Great job! Now you have some hands-on experience working with a plugin recipe. This is just a first step in working with plugins. You can try using other components in the Census USA plugin, such as the dataset connectors. You can also install plugins that include other kinds of components and try using them in your workflow.

Be sure to register for the Academy Plugin Store course to validate the knowledge gained from this tutorial.