How-To: Feature Store

The Feature Store is a dedicated zone in Dataiku where you and your team can centrally access and share datasets that have been prepared for machine learning.

The Feature Store allows you to share clean, high-value datasets as Feature Groups, so colleagues can easily find information to enrich their own projects without reconstructing the processing pipeline.

Note

Completing this tutorial requires a Dataiku instance, version 11 (or above).

Add a Dataset to the Feature Store

Before you can add datasets to the Feature Store, an administrator must grant you the “Manage Feature Store” permission in Dataiku.

To add a dataset to the Feature Store, you promote it to a feature group, which is the term for a dataset that has been flagged for reuse in other projects and machine learning models.

Sharing will be smoother if you first enable the Quick sharing feature, which allows other users to freely use this object in their projects without requesting permission. If you want users to request permission before using the feature group, you can leave Quick sharing off.

To enable Quick sharing:

  • From the Flow, select the dataset you’d like to add to the Feature Store, and navigate to the Actions tab of the right panel.

  • Select Share.

  • Toggle Quick sharing to On and click Share.

Dialogue box to turn on quick sharing of a dataset.

Now promote the dataset to Feature Group status so it will be included in the Feature Store.

  • Select the same dataset and navigate to the Actions tab.

  • Select Publish.

  • Select Feature Store: Promote as Feature Group; then Promote.

The dataset icon in the Flow now includes two new badges — an arrow and a checkmark ribbon, indicating that it is both shared and has been promoted to Feature Group status.

Badges on the dataset icon indicating the object is shared and in the Feature Store.

Add A Feature Group to the Flow

Any user with access to the source project can now view the dataset, its schema, and other information in the Feature Store.

  • From the Flow, click +Dataset > Feature Group. Alternatively, directly navigate to the Feature Store from the Applications (waffle) menu near the top right.

  • In the Store, select the dataset you just shared.

In the right panel, you can view information about the shared Feature Group, including its schema, users, and the creation and latest modification dates.

Dataset details and options in the Feature Store.

You also can select one of three actions for the dataset:

  • Use to add this dataset to another project

  • Explore to view a sample of the dataset and explore it

  • Remove if you no longer want to include this dataset in the Feature Store

Take a few minutes to explore the Feature Store, its search options, and the details for feature groups.

What’s Next?

Complete a hands-on tutorial on building your Feature Store in Dataiku.