Concept | Statistics cards for fit curves and distributions#

Watch the video

Distribution fitting#

The Fit Distribution card in Dataiku can fit univariate distributions, such as the Gaussian (normal), exponential, beta distributions, and more, to the data in each numerical column of a dataset.

The card displays goodness-of-fit metrics, the estimated parameters of the distributions, and a Q-Q plot that compares the quantiles of the data to the quantiles of the fitted distributions. Observing points far from the identity line in a Q-Q plot indicates a poor distribution fit.

../../_images/QQplot.png

You can also fit a bivariate normal (or joint normal) distribution to two variables that are jointly distributed. Alternatively, you can visualize the 2-dimensional kernel density estimate (or 2D KDE) plot by using the 2D Fit Distribution card.

../../_images/2dKDEplot.png

Dataiku uses a Gaussian kernel for the 2D KDE plot and accepts values for the X and Y relative bandwidth parameters, used to scale the horizontal and vertical KDE bandwidths. The smaller the parameter values, the less smooth the KDE plot appears.

Curve fitting#

Similarly, for the numerical columns, the Fit Curve card allows for modeling the relationship between two variables. It can use an Isotonic curve, which uses a free-form linear model to fit the data, and is strictly non-decreasing or non-increasing.

../../_images/isotonic-curve.png

It can also use a Polynomial curve, which uses a polynomial function of a specified degree.

../../_images/polynomial-curve.png