Concept | Generate statistics recipe#

There are a few ways to generate and view the statistics of a dataset in Dataiku:

The Generate statistics recipe allows you to embed your statistics in the Flow.

Dataiku screenshot highlighting the Generate statistics icon in the right panel of the Flow.

Use cases#

By embedding a Generate statistics recipe in the Flow, you can:

  • Perform some post-processing to your statistics.

  • Automate the execution of your statistics tests.

  • Base further data processing on the features or columns that passed a certain test.

Generating statistics in the Flow#

Once you run a statistics recipe, you’ll be able to see the specific recipe and resulting output dataset in the Flow.

Dataiku screenshot of a Flow with two statistics recipes: one chi-square test and one univariate analysis.

Each output dataset for a certain statistics recipe has a unique format. For instance, the univariate analysis output dataset will include rows with general statistics such as mean, median, and standard deviation, as well as rows with quantile information.

Conversely, the chi-square output dataset only includes one row that encapsulates information including the p-value, test conclusion, and more.

Recipe configuration#

When creating a Generate statistics recipe, you’ll be able to choose from a set of statistics tests.

Dataiku screenshot of the univariate analysis recipe configuration page.

While the settings for each statistics test will look slightly different, here is one example of the univariate analysis recipe configuration:

Dataiku screenshot of the univariate analysis recipe configuration page.

Like other visual recipes, you are able to filter and sample the dataset within the recipe.

Additionally, you can choose what information you would like to see in the output table, such as:

  • Summary statistics

  • Frequency table values

  • Quantile table values

  • Confidence intervals

You can also choose a split column to perform univariate analysis on subgroups of a variable.

As you can see, the options for univariate analysis in the recipe are quite similar to those in the univariate analysis statistics card configuration.

Dataiku screenshot of the univariate analysis statistics card settings for the purchase_amount variable.

What’s next?#

To learn more about statistical tests in Dataiku, visit the reference documentation on Statistical Tests.