Concept | Statistics cards for fit curves and distributions#

See the video version of this article

Distribution fitting#

Using the Fit Distribution card in Dataiku, we can fit univariate distributions such as the Gaussian (normal), exponential, beta distributions, and more, to the data in each numerical column of our data.

The card displays goodness-of-fit metrics, the estimated parameters of the distributions, and a Q-Q plot that compares the quantiles of the data to the quantiles of the fitted distributions. Observing points far from the identity line in a Q-Q plot indicates a poor distribution fit.

../../_images/QQplot.png

We can also fit a bivariate normal (or Joint normal) distribution to two variables that are jointly distributed, or we can visualize the 2-dimensional kernel density estimate (or 2D KDE) plot, by using the 2D Fit Distribution card.

../../_images/2dKDEplot.png

Daitaiku uses a Gaussian kernel for the 2D KDE plot and accepts values for the X and Y relative bandwidth parameters, used to scale the horizontal and vertical KDE bandwidths. The smaller the parameter values, the less smooth the KDE plot appears.

Curve fitting#

Similarly, for the numerical columns, the Fit Curve card allows us to model the relationship between two variables, by using either an Isotonic curve, which uses a free-form linear model to fit the data, and is strictly non-decreasing or non-increasing

../../_images/isotonic-curve.png

or by using a Polynomial curve, which uses a polynomial function of a specified degree.

../../_images/polynomial-curve.png