Visually Build Out the Data Pipeline

Organizing data pipelines to transform, prepare, and analyze data is critical for production-ready AI projects. The Dataiku DSS visual flow allows coders and non-coders alike to easily build data pipelines with datasets, recipes to join and transform datasets, and the ability to build predictive models.

In this section, we’ll discover how to build out the data pipeline visually, and using a code recipe, by joining two datasets together. This will allow us to compare using a visual and a code recipe.

../../../_images/join-datasets.png

Join Datasets Using a Visual Recipe

Let’s discover how the flight and airport data was joined using a visual recipe.

  • Go to the Flow.

  • Double-click the Join recipe that was used to create flight_info_joined to open it.

In the Join step, we can see that a left join was used to join the datasets on the origin and destination values, which matches the IATA_FAA column in the airport dataset.

  • In the left panel, scroll down and to the Output step.

Dataiku DSS let’s us convert our Join recipe to an SQL recipe and add it to the Flow. That way, we don’t have to write the query from scratch and we can manually edit it later.

  • Click View Query.

../../../_images/join-recipe-output-query.png

We could simply convert this query to a SQL recipe, but we want to keep both the visual recipe for comparison purposes. For now, we’ll use this query to create a new SQL recipe in the Flow.

  • Select the entire query and copy it to the clipboard.

  • Close the query and return to the Flow.

Join Datasets Using a Code Recipe

Let’s create a new SQL recipe using the query we just copied.

  • Click the dataset, flight_input_prepared, to select it.

  • Open the side panel by clicking the arrow at the top right corner of the page to view Actions.

  • In Code recipes, select SQL.

../../../_images/create-sql-recipe1.png
  • Click Create to create an SQL query.

  • Name the output dataset flight_joined_sql.

  • Store it into the same SQL connection as the input dataset, then click Create Dataset.

../../../_images/new-sql-query-recipe.png
  • Click Create Recipe.

  • Replace the code with the query you copied in the previous section.

../../../_images/compute-flight-sql-joined.png
  • Save and Run the recipe, accepting the schema update if prompted.

Dataiku DSS creates the recipe.

  • Return to the Flow.