Concept | Summarize text recipe#

The Summarize text recipe enables you to summarize texts into smaller ones using large language models (LLMs).

If the input text is too long to be managed by the LLM, the recipe:

  1. Intelligently chunks the long input text into manageable sections.

  2. Processes each section separately.

  3. Compiles a coherent summary.

Prerequisites#

To use the Summarize text recipe, you’ll need:

  • A Dataiku instance (version 12.3 and above). Dataiku Cloud is compatible.

  • A connection to at least one supported generative AI model. Your administrator must configure them beforehand in the Administration panel > Connections > New connection > LLM Mesh.

    Supported model connections include models such as OpenAI, Hugging Face, Cohere, etc.

Text summarization in Dataiku using generative AI.

Recipe settings#

In the recipe settings page, you:

  1. Select which LLM to use. The dropdown lists only connections that are available in the current instance. Some models are specifically designed for summarization. Yet, you can use generic text generation models like OpenAI GPT 3.5.

  2. Assign a text column as input. This column is the one that contains the text to summarize.

  3. Optionally, specify a language in which the summary should be written.

  4. Optionally, set the length of the desired summary. Depending on the model, you can set a minimum and maximum summary length expressed in tokens.

Screenshot of the settings page of a Summarize text recipe.

Note

The settings may vary from one model to another.

Output dataset#

After running the recipe, if you look at the output dataset, you can see:

  • The summary for each row in the summarized_text column.

  • Any error messages in the llm_error_message column. The model fills this column in only when it returns an error. Otherwise, it leaves the field blank.

Screenshot of an output dataset of the Summarize text recipe.

What’s next?#

Continue learning about text summarization with LLMs by working through the Tutorial | Summarize text with generative AI article.