Tutorial | Semantic models for agents#

Get started#

Imagine that you work with sales data for an electronics retailer. You are tasked with creating an AI agent that helps your less technical colleagues answer questions from a sales database.

While testing the agent, you’ve noticed that it sometimes needs additional context to accurately answer queries. For example, users might frequently request specific aggregations or data by fiscal year, which aren’t defined in the dataset.

To help solve these issues, you can create a semantic model, which builds a foundation of context between structured datasets and the LLM that queries them.

Note

This use case includes a small, simplified example of sales data to reduce the processing costs and time associated with LLM usage.

Objectives#

In this tutorial, you will:

  • Build a semantic model with business context.

  • Learn the components of a semantic model, including entities and glossary.

  • Test the semantic model.

  • Use the model in an agent via a tool.

Prerequisites#

To complete this tutorial, you will need:

  • Dataiku 14.4 or later.

  • Data Designer, Advanced Analytics Designer, or Full Designer profile.

  • Connections and a plugin installed by your administrator:

Tip

If you’re not familiar with agents in Dataiku, here are some helpful resources:

Create the project#

  1. From the Dataiku Design homepage, click + New Project.

  2. Select Learning projects.

  3. Search for and select Semantic Models Lab.

  4. If needed, change the folder into which the project will be installed, and click Create.

  5. From the project homepage, click Go to Flow (or type g + f).

Note

You can also download the starter project from this website and import it as a ZIP file.

Prepare the data#

Before building a semantic model, you need to store the dataset in your SQL connection.

  1. With sales_data selected in the Flow, open the right Actions panel.

  2. Create a Sync recipe.

  3. Select your SQL connection information to store the data into.

  4. Click Create Recipe.

  5. In the recipe Configuration page, click Run.

Screenshot showing how to sync a dataset to a SQL connection.

It’s also a good practice to create column descriptions before putting data into the Semantic Models Lab.

This data already includes metadata for the dataset and each column. Take a moment to read through the descriptions by hovering over each column header.

Tip

In your own projects, if AI assistants are enabled on your instance, you can generate metadata using an LLM. Otherwise, you can add descriptions manually in the Schema panel or by clicking on each column header and selecting Edit column schema.

Screenshot showing column descriptions for the dataset.

Create a semantic model#

First, create a new semantic model.

  1. Go to the GenAI menu (GenAI icon.) > Semantic Models.

  2. Click + New Semantic Model.

  3. Name it Sales model.

  4. Click Create.

Screenshot showing steps to create a new semantic model.

Generate entities#

The first step to building a semantic model is creating entities, or the tables that will be linked with specific context.

You can do this manually or by using an LLM to create entities according to your instructions. This tutorial will use the automatic mode with an LLM.

  1. Click + Add Entity.

  2. Choose the Automatic tab.

  3. Under Project, choose this project (the key should start with TUT_SEMANTIC_MODELS_LAB).

  4. For Dataset, select sales_data_copy.

  5. For LLM, choose from your available LLM connections.

  6. In the Additional instructions, copy and paste the following:

    Create three entities with the following columns in each:
    
    - transactions: sales_id, date, product_id, customer_id, quantity, unit_price
    - customers: customer_id, region, customer_status, customer_name
    - products: product_id, product, product_type, product_desc
    
  7. Click Generate.

  8. After the model finishes, review the information in the window, then click Add to model.

Screenshot showing the steps to create entities.

Important

LLMs can produce output that differs from the instructions. If your entities don’t match the instructions, you can manually delete or add columns after adding them to the model. You’ll need to do this to ensure that you can replicate the rest of this tutorial.

After adding the entities to the model, click through them in the left panel and review the information in each. The columns are known as attributes. Note that you can change attribute names, the source column each is mapped to, the data type, and the description.

Index the values#

The next step is to index the distinct values for each entity’s attributes. Indexing is necessary for the model’s term resolution, which is when a user’s query terms are fuzzy-matched to values from these attributes.

  1. Navigate to Indexing in the left panel.

  2. Choose the Embedding LLM to use for indexing.

  3. Click Save.

  4. Select Start Indexing.

  5. Wait for the page to display Indexing completed.

Screenshot showing the steps to create entities.

Add metrics#

Within each entity, you can use SQL expressions to define metrics, which are aggregations of the attributes. For this sales data, it would be useful to add metrics for total sales across the entire table and revenue per transaction (among others).

  1. Navigate to the transactions entity.

  2. Under Metrics, click + Add metric.

  3. Name this metric total_sales.

  4. In the SQL expression window, type SUM(quantity * unit_price).

  5. In the description window, type Total sales value (before returns).

  6. Add another metric with these attributes:

    • Name: avg_rev_per_sale

    • SQL expression: SUM(quantity * unit_price) / COUNT(DISTINCT sales_id)

    • Description: Average revenue per sales transaction

  7. Click Save.

Screenshot showing steps to add metrics to an entity.

Add relationships#

To ensure that queries will work correctly, you can define the relationships between tables.

In this case, the LLM should have automatically created relationships when it set up the entities.

  1. Navigate to Relationships in the left panel.

  2. Check that the relationships are set correctly. If not, add these two relationships manually:

    • transactions relates to customers on transactions.customer_id = customers.customer_id

    • transactions relates to products on transactions.product_id = products.product_id

Screenshot showing steps to check or add relationships among the entities.

Add a Glossary#

The Glossary is a collection of terms, along with their definitions and synonyms, that build business context around the data. You can add terms automatically from business documents or manually, as you will do in this tutorial.

  1. Go to the Glossary in the left panel.

  2. Click + Add Terms and choose Add manually.

  3. Add the Term region.

  4. For the Description, copy and paste this text:

    Region is an area of the United States comprised of certain states. Each region is defined as:
    
         Northeast: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont, New Jersey, New York, Pennsylvania, Delaware, Maryland
         Midwest: Illinois, Indiana, Michigan, Ohio, Wisconsin, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, South Dakota
         South: Florida, Georgia, North Carolina, South Carolina, Virginia, West Virginia, Alabama, Kentucky, Mississippi, Tennessee, Arkansas, Louisiana, Oklahoma, Texas
    Southwest: Arizona, Colorado, Idaho, Nevada, New Mexico, Utah
    West: Montana, Wyoming, California, Oregon, Washington, Alaska, Hawaii
    
  5. In Synonyms, write area.

  6. Click Add.

  7. Add another term:

    • Term: fiscal year

    • Description: The fiscal year is from July 1 to June 30.

    • Synonyms: business year, financial year, accounting year

  8. Click Add and Done.

Screenshot showing steps to add terms to the glossary of a semantic model.

In this case, the region term should be linked to an attribute so the model understands that it defines the regions listed in the data. The fiscal year term should remain unlinked so the model uses it as general context when it answers queries.

  1. In the Glossary, click on the Used tab.

  2. Click + Use Terms and choose Manually.

  3. Make the following selections:

    • Term: region

    • Entity: customers

    • Target Type: Attribute

    • Target Element: region

  4. Click OK.

  5. Under General Terms, click in the Search and add box, then add fiscal year.

  6. Click Save.

Screenshot showing steps to link glossary terms to the semantic model.

The model now has access to both terms from the Glossary.

Add instructions#

You can add instructions that will be used in every query to help guide the LLM. In this example, the instructions will be simple, though in the real world they can be much more complex.

  1. Navigate to Instructions in the left panel.

  2. Copy and paste the following instructions:

Use the sales data to answer questions about sales by product, region, customer, etc.

If a user asks about sales for a state, use the region associated with that state and specify that data is available only by region, not by state.
  1. Click Save.

Screenshot showing steps to add instructions for an LLM to use the semantic model.

Test the model#

The model is now ready to test in the Playground! You can test a number of example user queries to see how the model responds.

In this case, you can use some queries designed to test the metrics, glossary definitions, and instructions you set up.

  1. Navigate to Playground in the left panel.

  2. Choose an LLM from your available connections.

  3. Copy and paste this Question to test the metric for total sales, and Run Query:

    What is the total of all sales by region?
    
  4. When the query finishes, review the results. Scrolling down the page, you can see how the model uses glossary terms and the generated SQL query behind the results it returned.

Screenshot showing steps to test a semantic model.

If you like the results, you can save this as a golden query, which can be used to save the expected output of frequent or complex queries.

  1. Click Save as Golden Query.

  2. Give the query a name, such as total sales by region.

  3. Click Save.

  4. Navigate to Golden Queries in the left panel and check that the query has been saved.

Note

You might need to refresh your browser page to see the Golden Query saved.

Screenshot showing steps to save a golden query in a semantic model.

You can continue testing the model with a few other queries. Here are some to copy, or try your own!

  • Test the metric for average revenue per sale:

What was the average revenue per sale in 2024?
  • Test the definition of region, and the instructions to use region instead of state:

What products are sold the most in Massachusetts?
  • Test the definition of fiscal year:

What is the total sales value for fiscal year 2024-25?

Important

Responses to the queries will vary depending on the LLM. In addition to testing multiple queries, you can use the Playground to test responses from various LLMs if you have multiple connections.

Use the model in an agent#

After you’re happy with the semantic model, you can use it in an agent with the Semantic Model Query tool. The tool queries structured data with SQL, using the semantic model as a context layer.

Create a new agent#

First, create the agent.

  1. Go to the GenAI menu (GenAI icon.) > Agents & GenAI Models.

  2. Click + New Agent.

  3. Select Simple Visual Agent, give it a name, such as sales data agent and click Create.

  4. In the agent Design page, choose an LLM connection.

  5. In the Instructions window, copy and paste this text:

You are a helpful assistant who queries sales data and provides answers to non-technical users. If the user asks a question that is not about the provided sales data, respond that you are unable to answer.
Screenshot showing steps to create a new agent.

Create the query tool#

Now, you can create a tool that queries the data using the semantic model.

  1. Click + Create Tool.

  2. Choose a Semantic Model Query tool.

  3. Give it the name query_data and click Create.

  4. In the tool Design page, select the Semantic Model name.

  5. Under LLM Configuration, select from your available LLM and Embedding LLM connections.

  6. Click Save.

Screenshot showing steps to create a Semantic Model Query tool.

Next, add the tool to the data analysis agent.

  1. Return to the Agents & GenAI Models page in the GenAI menu.

  2. Click on the sales data agent name to open the agent.

  3. Click + Add Tool and select the query_data tool.

  4. Click Save in the top right corner.

Screenshot showing steps to add a tool to the agent.

Test the agent#

Chat with the agent to test it within the agent Design page. You might want to start with some of the same test queries from the semantic model.

  1. Copy and paste this question into the Chat window.

What was the average revenue per sale in 2024?
  1. Continue to test the agent with some of these queries, or try your own:

Which product generated the most total sales value in the Northeast region?
List the average revenue per sale for each region in 2025.
What is the total quantity sold for each product type?
Screenshot showing steps to test the agent that uses the semantic model.

Next steps#

Congratulations! You have successfully built a semantic model via the Semantic Model Lab and used it in an agent!

Your next steps might be to learn about using agents in Agent Hub.