Tip | Ensuring metadata completeness in Data Collections#
Data Collections within the Data Catalog are spaces where key datasets can be grouped together for easy discovery and reuse within an organization.
To help teams work efficiently and accurately, Dataiku offers multiple ways to add metadata—such as column and dataset descriptions—and tools for Data Collections owners to ensure that metadata is included with datasets in a collection.
The collection owner can set up checks to ensure that published datasets include dataset descriptions and/or column descriptions. These checks can either:
Display a warning when the metadata is not included.
Prevent a dataset from being included in a collection when the metadata is not included.
In this example, the included dataset fails a check for the dataset description and four of 33 columns.

Adding metadata#
If metadata requirements aren’t met, you’ll see a warning when publishing a dataset to a Data Collection.

Clicking on Add metadata takes you to the Settings > Schema subtab of a dataset, where you can add the descriptions in several ways:
Manually, by selecting each column and typing descriptions.
Syncing descriptions from an underlying database, for datasets pulled from supported connections such as Snowflake, PostgreSQL, or BigQuery.
Using a Large Language Model (LLM) to generate metadata.
