Tip | Number summation vs. concatenation across columns in Dataiku formulas#

In general, 1+1=2. However, in Dataiku, the following may also be true in certain cases: 1+1=11. What’s happening here? When are numbers concatenated rather than summed?

When 1+1=11 and 1+1=2 both happen#

Consider this example:

A

B

C

D

= A + B + C + D

1

2

3

4

10

null

2

3

4

234

1

null

3

4

134

1

2

null

4

34

1

2

3

null

6

But all the data is stored as integers! What’s the logic behind this add/concatenate behavior?

Logic:

  • 1+2+3+4=10 ← add 1 + 2, add 3, add 4

  • null+2+3+4=234 ← (null + 2), then concatenate 3, concatenate 4

  • 1+null+3+4=134 ← (1 + null), then concatenate 3, concatenate 4

  • 1+2+null+4=34 ← add 1 + 2, then (+ null), then concatenate 4

  • 1+2+3+null=6 ← add 1 + 2, add 3, (+ null), then?

See the pattern of how addition turns into concatenation? It always happens after a null value. Why? The short answer is, “It’s complicated.”

Switching from add → concatenate#

Here’s the long answer: Autodetection for operations occurs by value and not by column. The detection of the column data type occurs after all the operations have been completed.

Concatenate always follows null + anything. It doesn’t matter that the data in your columns are numbers, or that they’re stored as a number data type, or that logically 1+nothing+3+4 should be 8, not 134.

What happened in the case of A+B+C+D above is that null values are turned into “”. Whenever the processor sees anything with a + “” it defaults to concatenation and turns the result into a string.

Guide for safe sums in Dataiku#

Your first option is to get rid of null values. Maybe you replace them with 0 or maybe you do something else. It’ll depend on your data.

There are many cases, however, where you want to keep null values to do other types of analyses (such as count null). If so, you have a few options:

  • Use the sum() function: sum([A,B,C,D]).

  • Include a nested IF: if(isnull(A),0,A)+if(isnull(B),0,B).

  • Replace if(isnull(A),0,A) with coalesce(A,0).

See also

See Formula language in the reference documentation to learn more.