How to fill empty cells of a column with the value of the corresponding row from another column

Handling missing data is one data preparation challenge that analysts routinely face. Should you discard observations with missing values or perhaps impute missing values with a summary value like the median?

To handle missing data, the Prepare recipe has dozens of built-in processors ready to solve many of the most common challenges without any coding. In addition, Dataiku DSS has its own Formula language to craft more custom solutions.

For example, in some cases, you may want to fill the empty cells of a column with values of the corresponding rows from another column.

In a Prepare recipe, use the Formula processor with the coalesce() function as shown below:

../../../_images/kb-coalesce-1.png

Here we fill the empty values of `col1` with the corresponding values of `col2` in a new column.

You can also specify multiple columns, or even directly specify the missing values.

../../../_images/dimitri_0-1588874053641.png

Here we fill the empty values of `col1` with the values of `col2`, or `0` when `col2` is also empty.

The Formula language gives you the flexibility to achieve more customized tasks. For example, you can combine functions in the same expression.

../../../_images/kb-coalesce-3.png

Here we fill the empty values of `col1` with the corresponding floored values of `col2` in a new column.

Where can I find more information?

See this article and video to learn more about using Formulas in Dataiku DSS.

What’s next?

You can also learn more about visual data wrangling more broadly with Dataiku DSS with this series of hands-on tutorials.