Concept | Analyze data quality in the Explore tab#
Watch the video
Column analysis#
From the Explore tab of a dataset, you can begin to investigate the values of any column in your dataset using the Analyze window, accessible when selecting Analyze in the context menu of any column header.
Analyze window overview#
This section presents the different elements in the Analyze window.
Sample management#
Keep in mind that, by default, Dataiku calculates the statistics shown in the Analyze window using the current sample configured in the Explore tab. However, it is also possible to compute them on the whole dataset.
Tip
To include a filter within the sample summary statistics, see How-to | Apply a filter to summary statistics in the Analyze window.
Tab description#
The Analyze windows opens displaying either the Categorical or Numerical tab, depending on the type of the column you’re analyzing.
The table below describes the different tabs.
Tab |
Description |
---|---|
Categorical (for categorical columns) |
Plots a bar chart, sorted by the most frequent observations. |
Numerical (for numeric columns) |
Plots a histogram and boxplot of the distribution. The tab also provides summary statistics, counts of the most frequent values, and detected outliers. |
Values clustering |
Allows you to find groups of similar values to standardized text fields that might have unwanted variation. |
Whether you’re in the Categorical or Numerical tab, a Summary section lets you easily review the data quality of the selected column. It reveals the number of valid, invalid, and empty values, as well as unique values that appear only once.
What’s next?#
In this article, you learned how to access the Analyze window from the Explore tab of a dataset to see statistics and the quality of a data sample. Continue learning about the basics of Dataiku with this charts concept.