Tip | Good dataset naming schemes#

Properly naming your datasets is an important element for collaboration. Good naming helps you and your colleagues quickly understand what a Flow achieves. Ideally, dataset names should be readable, self-explanatory, and short.

When creating a recipe, Dataiku creates a default output name by appending the name of the operation to the input’s name. This ordered naming scheme has the benefit of being simple, but it quickly becomes unreadable.

Try to replace this default name with something more self explanatory. A good method is to focus on what the created dataset will be used for, and find differentiating names, e.g. foo_raw, foo_clean. The input is raw data, the output is clean.

Compatible naming conventions#

The following rules maintain names compatible with all storage connections (SQL dialects, HDFS, Python dataframe columns, etc.):

  • Only alphanum and underscore (_).

  • All lowercase characters.

  • No spaces.

  • Does not begin with a number.

Optionally, you can adopt prefixes and suffixes for your datasets. (E.g.: foo_t for a dataset in a SQL database, foo_hdfs for a HDFS dataset etc…)

Keep the same tips in mind when naming columns of your datasets, notebooks, and projects.

Tip

You can rename a dataset in the Flow by right clicking on it to open the context menu or finding the same function in the right panel.