Concept | Analyze data quality in the Explore tab#

Watch the video

Column analysis#

From the Explore tab of a dataset, you can begin to investigate the values of any column in your dataset using the Analyze window, accessible when selecting Analyze in the context menu of any column header.

A Dataiku screenshot showing how to use the Analyze window to access and view statistics on a data sample.

Analyze window overview#

This section presents the different elements in the Analyze window.

Sample management#

Keep in mind that, by default, Dataiku calculates the statistics shown in the Analyze window using the current sample configured in the Explore tab. However, it is also possible to compute them on the whole dataset.

Dataiku screenshot of the Analysis window.

Tab description#

The Analyze windows opens displaying either the Categorical or Numerical tab, depending on the type of the column you’re analyzing.

Dataiku screenshot of the tabs in the Analysis window.

The table below describes the different tabs.

Tab

Description

Categorical

(for categorical columns)

Plots a bar chart, sorted by the most frequent observations.

Numerical

(for numeric columns)

Plots a histogram and boxplot of the distribution.

The tab also provides summary statistics, counts of the most frequent values, and detected outliers.

Values clustering

Allows you to find groups of similar values to standardized text fields that might have unwanted variation.

Whether you’re in the Categorical or Numerical tab, a Summary section lets you easily review the data quality of the selected column. It reveals the number of valid, invalid, and empty values, as well as unique values that appear only once.

What’s next?#

In this article, you learned how to access the Analyze window from the Explore tab of a dataset to see statistics and the quality of a data sample. Continue learning about the basics of Dataiku with this charts concept.