Concept | How model development impacts MLOps#

As discussed in Dataiku’s O’Reilly ebook introducing MLOps, you can avoid common pitfalls associated with MLOps by performing certain tasks during the model development stage itself.

While there are several components to model development, we’ll focus on key components of model development that can have an impact on MLOps after deployment. The earlier in the MLOps lifecycle that these components are considered, and their measurements addressed, the less the accumulated debt will be.

Alignment with business objectives#

Business objectives should guide model development. We can build safeguards like sanity checks and visualizations to keep stakeholders informed by considering business objectives during the design phase.

We can then be prepared to take action, such as redesigning and redeploying the model or adjusting business metrics.

We know model development is based on business objectives when we can answer questions such as these:

  • What is the business problem we want to address? How will the model address it?

  • What level of model performance is acceptable to the business?

  • Is the model negatively or positively impacting the business?

  • How are models monitored over time to detect model deterioration?

  • Who will be responsible for the performance and maintenance of machine learning models in production?

Exploratory Data Analysis (EDA)#

To develop a model, we need suitable data. Once the organization has established the business objectives for the model, we can explore and analyze the data. There are many tests for data suitability:

  • Is the data quality sufficient? For example, is the data complete and in a suitable format?

  • Can we legally use the data?

  • How can we ensure our data security practices satisfy government regulations, such as the EU General Data Protection Regulation (GDPR)?

  • Will stakeholders understand the data and be able to derive analyses from it?

  • Is the data sufficiently accurate, reliable, and free from bias — both statistical and discriminatory?

  • Will the data be available in real-time once in production?

  • As data changes, how will we continuously monitor and evaluate model performance to ensure the model is behaving as expected?

Feature engineering and selection#

Feature engineering and feature selection are crucial steps in model development. Once we better understand the data, we can clean and transform it into features representing the problem. These features then become inputs used to improve model performance.

There are many considerations when selecting features, including the following:

  • What features can we create from existing data according to EDA and business expertise?

  • Does the feature help solve the problem identified by the business objectives?

  • What level of explainability do we need for the features?

  • Is the feature irrelevant and of poor data quality?

  • How will we maintain these features over time?

Reproducibility#

Data scientists experiment and iterate on a model many times before it is ready for production.

Sometimes, we want to save specific model versions to use later. For a model to be reproducible, we need to have version control of all the assets and parameters involved, including the data used to train and evaluate the model.

Here are some tests for model reproducibility:

  • Is the design environment well documented?

  • Can the same model results be reproduced in production?

  • Is there version control of all the assets and parameters involved?

Responsible AI#

When building out AI pipelines, it is critical to embed responsible AI in every stage, asking questions and looking for key checkpoints along the way.

We not only want our models to be reproducible, but we also want them to be accountable, sustainable, and governable.

Whether or not responsible AI is a legal requirement, it makes good business sense because it lowers risk.

  • How can we ensure that our model behaves in ways aligned with vetted business objectives?

  • What protected characteristics can we omit from the model training process (such as ethnicity, gender, age, religion, etc.) to protect the data privacy of our customers, employees, users, and citizens?

  • How do we account for and mitigate model bias and unfairness against certain groups?

  • How long can we legitimately retain data beyond its original intended use?

  • Are the means by which we collect and store data in line with regulatory standards such as the GDPR and our own company’s standards?

  • How can we ensure responsible AI over time?

See also

See the Responsible AI section of the Knowledge Base to learn more.

Training, evaluation, and drift#

Through an intensive and iterative process of training and optimization, new ML models are built. We will want to keep track of each experimental model and perform side-by-side comparisons. We will have to decide the criteria used in model evaluation. Some examples of these criteria are:

  • What performance metrics are measured when developing and selecting models?

  • How explainable is the model?

  • How easy is it to deploy the model?

  • How does the model treat members of different groups (i.e., is the model fair?)?

On top of model evaluation that ensures we choose the best algorithm, we also need to set up validation tools to help answer:

  • Will the machine learning model still be aligned with the business objective in a day, a month, a year?

  • Can I track data drift to be able to update the model?

Resources for further exploration#