Solution | LLM Provider Due Diligence#

Overview#

This Solution provides governance and procurement teams with a series of templates designed to manage the approval and usage of LLMs across Dataiku.

Introduction to LLM Provider Due Diligence#

The increased use of third-party provided LLMs across the enterprise requires a comprehensive approach to managing information about LLM providers, tracking LLM usage, and enforcing guardrails.

First, governance and procurement teams need to conduct review and due diligence on providers and their associated models before allowing downstream usage. Once a model has been approved for usage, appropriate guardrails should be defined and applied across connections. Finally, governance teams may wish to monitor usage and ensure that relevant checks for consistency and fairness are applied before an AI system is deployed.

To support these activities, Dataiku offers the LLM Provider Due Diligence Solution, which provides a series of templates to manage the approval and usage of LLMs across Dataiku. Combined with guardrails available in the LLM Mesh and integrated tracking of projects that use GenAI in Govern, this solution provides an end-to-end approach to governing LLMs in the enterprise.

Installation#

Please reach out to your Account Team who will support you with the Solution installation.

Technical Requirements#

For this Solution to work on Dataiku’s Govern Node, the user must have access to Advanced Govern. The Solution can be implemented on either cloud or self-managed instances of Dataiku.

We recommend that you are running Dataiku v13.0.0 or newer as these versions will have conditional workflow views and table artifact references enabled allowing for the full functionality of the solution. Older versions of Dataiku and Dataiku Govern will require adapting the conditionality, table view, and Python hooks running in the solution.

Walkthrough#

From the newly-created “LLM Providers” custom page, click on the “Create” button to generate a new Vendor Profile artifact. This artifact can be edited to reflect pertinent information about the LLM provider, including contact details, information security practices, and relevant regulation or compliance expectations.

Dataiku screenshot of created vendor profile artifact for LLM providers.

To build the associated model cards for this provider, select “Edit” on the artifact and “+ Add” under the Associated Models field. From the pop-up modal, you can create a new model card or select a pre-existing one if it has not already been linked to another vendor. The Model Card template is designed in alignment with Hugging Face’s annotated model cards template. On the model card artifact, you can fill out information related to the development, training and testing of the LLM based on information provided directly from the provider. This information can also be programmatically filled out using the Govern API to write information to the relevant fields.

Dataiku screenshot showing a built model card for the provider.

From the model card artifact, you can create or select an existing Connection artifact to link to the model card. While Model Cards can only be linked to a single provider, it is possible to enable the same model through multiple connections in the LLM Mesh, and this is reflected in Govern as a list of all associated connections that have enabled that specific model.

Additionally, a single Connection artifact can be linked to multiple Model Cards based on how the connection has been designed in the LLM Mesh. The Connection artifact in Govern is a reflection of the connection metadata from the LLM Mesh (configured separately in the Design node). This information can be manually inputted into Govern after the connection has already been established on the Design node.

You may also use Dataiku APIs to programmatically populate metadata in Govern to reflect the settings and constraints set on the LLM Mesh side.

The Connection artifact provides an easy-to-find view into the guardrails set on a specific connection, allowing governance teams to ensure that connections used in downstream projects are aligned with their expectations for appropriate usage. From the Connection page, you can also manually link a Govern Project into a list of projects that use this connection as another form of organization across instances.

Dataiku screenshot showing a connection artifact in Govern.

Finally, this solution optionally provides a custom Bundle template for projects that make use of LLMs. When governing a new bundle, you can select the “LLM Bundle” template from the dropdown. This template includes a series of workflow steps to assess core questions on security, data privacy, fairness, and overall evaluation metrics related to the use of an LLM within the bundle itself.

Within the bundle template, checks are focused on the outputs of an LLM application -—- such as a sentiment analysis recipe or an instance of Dataiku Answers -—- and whether or not the application meets the standards required before deployment. The sign-off step can act as a gate to deployment based on how production infrastructure has been configured. The questions and workflow steps provided in this bundle template are meant to reflect a baseline set of questions and checks on a bundle before deployment. Using the blueprint designer, this template can be updated with additional questions or steps as deemed appropriate by your organization.

Dataiku screenshot showing final sign off of an LLM in Govern.