Tutorial | Export data from a filtered dataset¶
In this tutorial, you’ll get hands-on practice filtering a dataset in the Explore tab and exporting the filtered results to a new dataset in the Flow.
You’ll need access to Dataiku version 11.3 or above (the free edition is enough). To get the free edition, visit the Get Started page.
We’ll be working with a fictitious dataset that contains credit card transactions for different merchant subsectors. The transactions are flagged as either authorized or unauthorized. Our goal is to analyze unauthorized transactions for a specific merchant subsector, in this case, the “gas” subsector.
To do this, we’ll filter the data using the authorized_flag column and the merchant_subsector_description column. We’ll export our filtered results as a new dataset in the Flow. We can do all of this in the Explore tab of our dataset.
Create the project¶
In this section, we’ll create the project by opening a tutorial with a completed Flow.
To open the tutorial:
Sign in to your instance of Dataiku.
From the Dataiku homepage, click on +New Project > DSS Tutorials > General Topics > Geospatial Analysis: CC Fraud Data.
You can also download the starter project from this website and import it as a zip file.
Dataiku opens the Summary tab of the project, also known as the project homepage.
Click Go To Flow.
In the lower right corner, click Flow Actions, then select Build all.
Wait while Dataiku builds the Flow, then refresh your browser window.
Export filtered results¶
You’ll notice the project contains more than one Flow zone. This lesson will focus on the transactions_joined dataset in the Default Flow zone. This dataset has been prepared from cardholder transactions for 2017 and 2018, merchant information, and cardholder information.
Open the transactions_joined dataset to view it in the Explore tab.
Filter the results¶
Let’s filter our data to display only the records we want to analyze.
Configure the transactions_joined dataset as follows:
In the authorized_flag column, keep only the records where the value equals zero.
In the merchant_subsector_description column, keep only the records belonging to the gas subsector.
In the search query, type
695to keep only the records where merchant_category_id equals 695.
Visualize the filtered results¶
To visualize the unauthorized transactions for the gas subsector:
Click on the Charts tab.
Dataiku retains the filters from the Explore tab, allowing you to create visualizations easily.
Return to the Explore tab.
Export the filtered results to the Flow¶
We are now ready to export our filtered results to the Flow.
Click Actions in the top right corner of the Explore tab.
Choose Export then Export to Dataset.
Select Apply column filters and search query defined for dataset.
Name the new dataset
Click Create Dataset.
You’ll notice there are three tabs in the Export window. You can choose to download your dataset to your computer, export the dataset to the Flow, or choose a format for exporting the data (such as Tableau hyper extract or Excel).
Dataiku exports the data, creating a new dataset in the Flow. You can now further analyze the data.