Concept | Summarize text recipe#
The Summarize text recipe enables you to summarize texts into smaller ones using large language models (LLMs).
If the input text is too long to be managed by the LLM, the recipe:
Intelligently chunks the long input text into manageable sections.
Processes each section separately.
Compiles a coherent summary.
To use the Summarize text recipe, you’ll need:
A Dataiku instance (version 12.3 and above). Dataiku Cloud is compatible.
A connection to at least one supported generative AI model. Your administrator must configure them beforehand in the Administration panel > Connections > New connection > LLM Mesh.
Supported model connections include models such as OpenAI, Hugging Face, Cohere, etc.
In the recipe settings page, you:
Select which LLM to use. The dropdown lists only connections that are available in the current instance. Some models are specifically designed for summarization. Yet, you can use generic text generation models like OpenAI GPT 3.5.
Assign a text column as input. This column is the one that contains the text to summarize.
Optionally, specify a language in which the summary should be written.
Optionally, set the length of the desired summary. Depending on the model, you can set a minimum and maximum summary length expressed in tokens.
The settings may vary from one model to another.
After running the recipe, if you look at the output dataset, you can see:
The summary for each row in the summarized_text column.
Any error messages in the llm_error_message column. The model fills this column in only when it returns an error. Otherwise, it leaves the field blank.
Continue learning about text summarization with LLMs by working through the Tutorial | Summarize text with generative AI article.