Partitioning¶
Dataiku enables users to split, or partition, datasets along meaningful dimensions. These partitions, or subsets of the original dataset, can then be computed independently.
Learn more about this technique in the following concept articles and tutorials.
Tip
Validate your knowledge of this area by registering for the Dataiku Academy course, Partitioning. Then challenge yourself to earn a certification!
Tip | Interacting with partitioned datasets using the Python API¶
If your recipe deals with partitioned datasets, in input or output, you need to be careful about reading and/or writing the correct data.
Reading and writing¶
If your recipe deals with partitioned datasets, in input or output, you don’t need to specify the source or destination partitions in your code. Reading and writing is done through Dataiku.
To read from or write to the input partitions (as defined by the partition dependencies), use “get_dataframe()”. This will automatically give you the relevant partitions.
Other purposes¶
For purposes other than reading or writing dataframes, you can access the partition name (as well as any other variables) you want to build using the Python dictionary called “dku_flow_variables”. This dictionary can be accessed using dataiku.dku_flow_variables
, as described in the reference documentation.
Note
dataset.get_write_partition() is deprecated.