Concept: Storage Type and Meaning

You might be wondering why there are two kinds of “types”.

The storage type indicates how the dataset backend should store the column data, and how many bytes will be allocated to store these values. Common storage types are string, integer, float, boolean, and date.

Meanwhile the meaning gives a “rich” semantic label to the data type. Meanings are automatically detected from the contents of the columns, but you can also define custom meanings. Meanings have high-level definitions such as url, ip address, or country. Each meaning is able to validate a cell value. Therefore each cell can be valid or invalid for a given meaning.

../../../_images/storage-type-vs-meaning.png

Storage types and meanings are related. Both constrain the values that the column can contain and are useful in managing data in different ways. You can find the storage type and meaning of each column in the Dataset view, when importing a dataset, and in the Explore tab for any dataset in your project.

The storage type of a column impacts its ability to serve as a key column when joining two datasets. For example, a string column in one dataset cannot serve as the key column with an integer column in another dataset.

While in the Explore tab of a dataset, DSS displays a context sensitive menu depending on the values in the column. For example, a column of unparsed dates and a natural language column will have their own relevant transformation options.

../../../_images/context-sensitive-menu.png

When the DSS-detected meaning does not reflect the values in the column, you might want to select a less restrictive meaning. For example, changing meaning from “integer” to “text” when some of the values in the column contain text.

../../../_images/value-validity.png