Concept | Generate statistics recipe#
There are a few ways to generate and view the statistics of a dataset in Dataiku:
Open the Analyze window to analyze the columns in your dataset.
Create a statistics worksheet and cards.
Use the Generate statistics recipe.
The Generate statistics recipe allows you to embed your statistics in the Flow.
Use cases#
By embedding a Generate statistics recipe in the Flow, you can:
Perform some post-processing to your statistics.
Automate the execution of your statistics tests.
Base further data processing on the features or columns that passed a certain test.
Generating statistics in the Flow#
Once you run a statistics recipe, you’ll be able to see the specific recipe and resulting output dataset in the Flow.
Each output dataset for a certain statistics recipe has a unique format. For instance, the univariate analysis output dataset will include rows with general statistics such as mean, median, and standard deviation, as well as rows with quantile information.
Conversely, the chi-square output dataset only includes one row that encapsulates information including the p-value, test conclusion, and more.
Recipe configuration#
When creating a Generate statistics recipe, you’ll be able to choose from a set of statistics tests.
While the settings for each statistics test will look slightly different, here is one example of the univariate analysis recipe configuration:
Like other visual recipes, you are able to filter and sample the dataset within the recipe.
Additionally, you can choose what information you would like to see in the output table, such as:
Summary statistics
Frequency table values
Quantile table values
Confidence intervals
You can also choose a split column to perform univariate analysis on subgroups of a variable.
As you can see, the options for univariate analysis in the recipe are quite similar to those in the univariate analysis statistics card configuration.
What’s next?#
To learn more about statistical tests in Dataiku, visit the reference documentation on Statistical Tests.