Overview#

Business case#

Facies are uniform sedimentary bodies of rock that are distinguishable enough from each other based on physical characteristics like sedimentary structure and grain sizes.

The ability to classify facies based on their physical characteristics is of great importance in the oil & gas industry. For example, identifying a succession of facies with sandstone units might be indicative of a good reservoir, as these sandstone units tend to have high permeability and porosity which are ideal conditions to store hydrocarbons.

Our goal is to increase knowledge of the subsurface and estimate reservoir capacity. To this end, our data team will estimate the quantity of hydrocarbons in a reservoir by observing the lateral extent and geometries of the facies containing the reservoir units.

Input data#

This use case requires the following two input data sources, available as downloadable archives at the links below:

The facies_vector_screen.csv file contains information about the facies characteristics.
The facies_labels.csv file contains a lookup table mapping the name of each facies type to a number.

These datasets come from the paper Comparison of four approaches to a rock facies classification problem by Dubois et.al.

Prerequisites#

To understand the workflow of this use case, you should be familiar with:

The concepts covered in the recommended courses of the Core Designer learning path
The Windows recipe
Machine Learning in Dataiku
Scenarios and Apps (Optional)

Technical requirements#

Have access to a Dataiku instance — that’s it!

Workflow overview#

The final Dataiku pipeline is shown below. You can also follow along with the completed project in the Dataiku gallery.

The Flow has the following high-level steps:

Upload, join, and clean the datasets.
Train and evaluate a machine learning model.
Generate features and use them to retrain the model.
Perform custom model scoring.
Publish insights to a dashboard.
Create a Dataiku app.