Visually Build Out the Data Pipeline¶
Organizing data pipelines to transform, prepare, and analyze data is critical for production-ready AI projects. The Dataiku DSS visual flow allows coders and non-coders alike to easily build data pipelines with datasets, recipes to join and transform datasets, and the ability to build predictive models.
In this section, we’ll discover how to build out the data pipeline visually, and using a code recipe, by joining two datasets together. This will allow us to compare using a visual and a code recipe.
Join Datasets Using a Visual Recipe¶
Let’s discover how the flight and airport data was joined using a visual recipe.
Go to the Flow.
Double-click the Join recipe that was used to create flight_info_joined to open it.
In the Join step, we can see that a left join was used to join the datasets on the origin and destination values, which matches the IATA_FAA column in the airport dataset.
In the left panel, scroll down and to the Output step.
Dataiku DSS let’s us convert our Join recipe to an SQL recipe and add it to the Flow. That way, we don’t have to write the query from scratch and we can manually edit it later.
Click View Query.
We could simply convert this query to a SQL recipe, but we want to keep both the visual recipe for comparison purposes. For now, we’ll use this query to create a new SQL recipe in the Flow.
Select the entire query and copy it to the clipboard.
Close the query and return to the Flow.
Join Datasets Using a Code Recipe¶
Let’s create a new SQL recipe using the query we just copied.
Click the dataset, flight_input_prepared, to select it.
Open the side panel by clicking the arrow at the top right corner of the page to view Actions.
In Code recipes, select SQL.
Click Create to create an SQL query.
Name the output dataset
flight_joined_sql
.Store it into the same SQL connection as the input dataset, then click Create Dataset.
Click Create Recipe.
Replace the code with the query you copied in the previous section.
Save and Run the recipe, accepting the schema update if prompted.
Dataiku DSS creates the recipe.
Return to the Flow.