FAQ | Can I detect toxic language?#

Since LLMs work by predicting the next most likely word in a sequence, they can propagate toxic language or otherwise harmful biases that they have learned during the training process.

Toxicity detection is key to ensure that the model’s input or output does not contain inappropriate or offensive content, and define appropriate remedies if they do.

To mitigate this risk, the Dataiku administrators can set the LLM connection to moderate the content both for queries and responses, leveraging dedicated content moderation models to filter input/output.

Screenshot of the toxic detection settings in LLM connections.

If toxic content is spotted, whether in a query or a response, the API call fails with an explicit error.