Reference | Dataiku metastore catalog#

The metastore catalog is a concept that originated from the Hive project. The metastore stores an association between paths (initially on HDFS) and virtual tables.

The Dataiku DSS (DSS) metastore catalog stores the association between paths and virtual tables.

A “table” in the metastore catalog is made of the following:

  • A location of the files making up the data,

  • A schema (column names and types),

  • A storage format indicating the file format of the data files, and

  • Other various metadata.

Hadoop Distributed File System (HDFS) and cloud storage datasets such as Amazon S3, Azure Blob Storage, and Google Cloud Storage (GCS) can have an associated table in the metastore catalog.

Note

To find out about the three kinds of metastores that DSS can leverage, DSS integration points with the metastore, and engines and features that leverage the metastore, visit Metastore catalog.