Perform custom scoring#

In this section, we’ll perform custom scoring based on domain knowledge that we have about facies classification.

Suppose we know that the facies type of adjacent (or neighboring) facies are good indicators of the type of a given facies. We can use this information to extend the ground truth (about the type of a given facies) to the type of its neighboring facies. Using this new ground truth information, we can compute new accuracy values on the predictions of the machine learning algorithm.

To perform the custom scoring, we will use a Python recipe in the Flow.

Create a Python recipe#

Our Python recipe will include custom code to implement these tasks:

Determine that a prediction in the predictions dataset is correct if the predicted facies type (in the prediction column) is the same as the Facies_type of one of the adjacent rows.
Compute new accuracy values based on this determination.

To create the Python recipe:

Select the predictions and metrics datasets in the Flow.
From the right panel, select a Python recipe.
In the “New Python Recipe” window, add two new output datasets custom_predictions and custom_metrics.
Create the recipe.

The Python recipe contains some starter code. You can modify the code as follows:

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import datetime

# Read recipe inputs
test_Dataset_Prediction = dataiku.Dataset("predictions")
test_Dataset_Prediction_df = test_Dataset_Prediction.get_dataframe()

# Read recipe inputs
test_Dataset_Metrics = dataiku.Dataset("metrics")
test_Dataset_Metrics_df = test_Dataset_Metrics.get_dataframe()

dico_adjacent_facies = {}
dico_adjacent_facies['Nonmarine fine siltstone'] = ['Nonmarine coarse siltstone']#1
dico_adjacent_facies['Nonmarine sandstone'] = ['Nonmarine coarse siltstone']#2
dico_adjacent_facies['Nonmarine coarse siltstone'] = ['Nonmarine sandstone', 'Nonmarine fine siltstone']#3
dico_adjacent_facies['Marine siltstone and shale'] = ['Mudstone']#4
dico_adjacent_facies['Mudstone'] = ['Marine siltstone and shale', 'Wackestone']#5
dico_adjacent_facies['Wackestone'] = ['Mudstone', 'Dolomite', 'Packstone-grainstone']#6
dico_adjacent_facies['Dolomite'] = ['Wackestone', 'Packstone-grainstone']#7
dico_adjacent_facies['Packstone-grainstone'] = ['Wackestone', 'Dolomite', 'Phylloid-algal bafflestone']#8
dico_adjacent_facies['Phylloid-algal bafflestone'] = ['Packstone-grainstone', 'Dolomite']#9

def custom_score(pred, label):
        return pred==label or pred in dico_adjacent_facies[label]

# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
# NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

test_Dataset_Custom_Metrics_df = test_Dataset_Prediction_df.copy()
test_Dataset_Custom_Metrics_df['Adjacent prediction'] = test_Dataset_Prediction_df.apply(lambda x: custom_score(x['prediction'], x['Facies_type']), axis=1)

acc = float(test_Dataset_Custom_Metrics_df['Adjacent prediction'].sum())/test_Dataset_Custom_Metrics_df['Adjacent prediction'].size
test_metrics_test_df = test_Dataset_Metrics_df.copy()
test_metrics_test_df = test_metrics_test_df.append({'accuracy':acc, 'date':datetime.datetime.now()}, ignore_index=True)

# Write recipe outputs
test_Dataset_Custom_Metrics = dataiku.Dataset("custom_predictions")
test_Dataset_Custom_Metrics.write_with_schema(test_Dataset_Custom_Metrics_df)

# Write recipe outputs
test_metrics_test = dataiku.Dataset("custom_metrics")
test_metrics_test.write_with_schema(test_metrics_test_df)

Click Validate then Run the Python recipe.
Return to the Flow and explore the dataset custom_metrics.
Sort the dataset in ascending order by the date column to see the changes in the metrics from the earliest to the current computation.

Notice that the custom accuracy obtained by using our custom scoring method is much higher (0.96) compared to the accuracy we obtained previously (0.88) using the default scoring method. We were able to improve the accuracy by leveraging the given domain knowledge about facies type.

Return to the Flow.