FAQ | Can I detect toxic language?#
Since LLMs work by predicting the next most likely word in a sequence, they can propagate toxic language or otherwise harmful biases that they have learned during the training process.
Toxicity detection is key to ensure that the model’s input or output does not contain inappropriate or offensive content, and define appropriate remedies if they do.
To mitigate this risk, the Dataiku administrators can set the LLM connection to moderate the content both for queries and responses, leveraging dedicated content moderation models to filter input/output.
If toxic content is spotted, whether in a query or a response, the API call fails with an explicit error.