This content is also included in the free Dataiku Academy course, Basics 101, which is part of the Core Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.
Each dataset column has two kinds of “types” in Dataiku. There is a Storage type and a Meaning.
The meaning gives a “rich” semantic label to the data type. Meanings are automatically detected from the contents of the columns, but you can also define custom meanings. Meanings have high-level definitions such as url, ip address, or country. Each meaning is able to validate a cell value. Therefore each cell can be valid or invalid for a given meaning.
DSS indicates an inferred meaning in blue at the top of each column. While you can’t use the meaning in the same way that you use the storage type, that is, to let Dataiku know how to store the data in the back end, you can use the meaning in surprisingly powerful and creative ways. For example, you can use the meaning to enable column transformations, measure the data quality of a column, and make specific values easier to find.
When Preparing your dataset, Dataiku displays a context sensitive menu depending on the values in the column. For example, a column of unparsed dates and a natural language column will have their own relevant transformation options.
When the Dataiku-detected meaning does not reflect the values in the column, you might want to select a less restrictive meaning. For example, changing meaning from “integer” to “text” when some of the values in the column contain text. You can even create your own meanings!