Prepare Input Data

  • Create a new blank Dataiku project and name it Facies classification.

Upload and Join Input Datasets

In this section, we’ll upload the two input datasets and join them into a dataset that contains the facies characteristics and their corresponding labels.

First, let’s create the facies_vector_screen and the facies_labels datasets in the project. From the project’s home page,

  • Click Import Your First Dataset.

  • Click Upload your files.

  • Add the facies_vector_screen.csv file.

  • Click Create to create the facies_vector_screen dataset.

  • Return to the Flow.

  • Create the facies_labels dataset in a similar manner.

  • Return to the Flow.

Flow view with the initial two datasets :scale: 100%

The facies_labels dataset contains a lookup table mapping each facies name to a number. We’ll join this dataset with facies_vector_screen.

  • From the Flow, click the facies_vector_screen dataset to select it. This dataset will serve as the “left” dataset in the Join recipe.

  • Open the right panel and select the Join recipe.

  • Select the facies_label dataset as the additional input dataset.

  • Name the output dataset facies_with_labels.

  • Click Create Recipe.

By default, Dataiku selects the column Facies as the join key.

Join definition.
  • Click the Selected columns tab and uncheck the Facies column from the facies_vector_screen dataset. This column was useful to ensure the mapping between each facies characteristics vector and the explicit facies label. We do not need it anymore.

Selected columns in the Join recipe.
  • Click Run to run the recipe.

  • Click Update Schema to accept the schema change for the output dataset.

  • Return to the Flow.

View of the Flow after running the Join recipe.

Prepare The Dataset

Explore the facies_with_labels dataset. The column names are not intuitive. For example, the column NM_M represents “nonmarine/marine indicator” and PE represents “Photoelectric effect”. We’ll rename the columns by using a Prepare recipe.

  • Return to the Flow and click the facies_with_labels dataset to select it.

  • From the right panel, select the Prepare recipe.

  • Keep the default output name facies_with_labels_prepared, and click Create Recipe.

  • Click + Add a New Step.

  • Search for the Rename columns processor and select it.

  • Click +Add Renaming.

  • Rename the following seven columns:

    • GR —> Gamma ray

    • ILD_log10 —> Resistivity

    • PE —> Photoelectric effect

    • DeltaPHI —> Neutron-density porosity difference

    • PHIND —> Average neutron-density porosity

    • NM_M —> Nonmarine/marine indicator

    • RELPOS —> Relative position

In a Prepare recipe, the Rename processor is used to rename 7 columns of the input dataset.
  • Click Run to run the recipe and click Update Schema when prompted.

View of the Flow after running the Prepare recipe.