
Business Case

Facies are uniform sedimentary bodies of rock that are distinguishable enough from each other based on physical characteristics like sedimentary structure and grain sizes.

The ability to classify facies based on their physical characteristics is of great importance in the oil & gas industry. For example, identifying a succession of facies with sandstone units might be indicative of a good reservoir, as these sandstone units tend to have high permeability and porosity which are ideal conditions to store hydrocarbons.

Our goal is to increase knowledge of the subsurface and estimate reservoir capacity. To this end, our data team will estimate the quantity of hydrocarbons in a reservoir by observing the lateral extent and geometries of the facies containing the reservoir units.

Input data

This use case requires the following two input data sources, available as downloadable archives at the links below:

These datasets come from the paper Comparison of four approaches to a rock facies classification problem by Dubois et.al.


To understand the workflow of this use case, you should be familiar with:

  • The concepts covered in the Basics 101, 102, and 103 courses

  • The Windows recipe

  • Machine Learning in Dataiku DSS

  • Scenarios and Apps (Optional)

Technical Requirements

Have access to a Dataiku DSS instance–that’s it!

Workflow Overview

The final Dataiku DSS pipeline is shown below. You can also follow along with the completed project in the Dataiku gallery.

View of the finished Flow.

The Flow has the following high-level steps:

  1. Upload, join, and clean the datasets

  2. Train and evaluate a machine learning model

  3. Generate features and use them to retrain the model

  4. Perform custom model scoring

  5. Publish insights to a dashboard

  6. Create a Dataiku application