Dataiku enables users to split, or partition, datasets along meaningful dimensions. These partitions, or subsets of the original dataset, can then be computed independently.
Learn more about this technique in the following concept articles and tutorials.
If your recipe deals with partitioned datasets, in input or output, you need to be careful about reading and/or writing the correct data.
Reading and writing¶
If your recipe deals with partitioned datasets, in input or output, you don’t need to specify the source or destination partitions in your code. Reading and writing is done through Dataiku.
To read from or write to the input partitions (as defined by the partition dependencies), use “get_dataframe()”. This will automatically give you the relevant partitions.
For purposes other than reading or writing dataframes, you can access the partition name (as well as any other variables) you want to build using the Python dictionary called “dku_flow_variables”. This dictionary can be accessed using
dataiku.dku_flow_variables, as described in the reference documentation.
dataset.get_write_partition() is deprecated.