Tutorial | Building and exploring a graph for patient data analysis#

Get started#

Graph analytics can be useful across many industries to help uncover hidden relationships in complex datasets.

In this tutorial, you’ll use the Dataiku Visual Graph plugin—a practical way to apply graph analytics—to transform complex, disconnected data into a clear visual network. This approach empowers both technical and non-technical users to explore insights.

Important

You can follow the tutorial from start to finish, or focus only on your role:

  • If you are a data scientist (editor), start from the beginning to learn how to design graphs.

  • If you are a business analyst (explorer), you can jump ahead to the Explorer’s journey (for non-technical users) to work with a ready-made project.

Objectives#

In this tutorial, you will:

  • Interactively set up and build a graph using the Visual Graph editor webapp.

  • Analyze and explore the graph to answer specific business questions.

  • Publish a graph for non-technical users to explore.

  • Explore a graph using the Visual Graph explorer webapp.

Prerequisites#

To complete this tutorial, you’ll need the following:

  • Dataiku 14.0 or later.

  • A Full Designer user profile for the editor’s journey (no specific profile requirement for the explorer’s journey).

  • Visual Graph plugin.

  • Basic knowledge of Dataiku (Core Designer level or equivalent).

Use case summary#

Imagine facing a common challenge: you have several datasets about patients, drugs, and diseases, but the relationships between them are hidden. A drug prescribed to a patient might conflict with another medication, yet this information is scattered across different tables.

Here are the datasets you’ll be working with:

Dataset

Description

Patient data

Contains patient data such as names, ages, and associated diseases.

Drug data

Lists drugs available on the market and the diseases they treat.

Disease data

Provides a taxonomy of diseases.

Drug contraindications

Identifies drug combinations that shouldn’t be prescribed together.

Your datasets hold all the necessary clues—from patient names and ages to lists of drugs, diseases, and critical contraindication details. The difficulty is bringing these pieces together into a form that reveals the bigger picture.

By creating a graph from these datasets, you will be able to uncover connections that are otherwise invisible in spreadsheets—for example, which patients are at risk due to conflicting prescriptions.

This narrative highlights why graphs are valuable and sets the stage for both technical and non-technical users to explore them effectively.

Create the project#

  1. From the Dataiku Design homepage, click + New Project.

  2. Select Learning projects.

  3. Search for and select Visual Graph Plugin.

  4. If needed, change the folder into which the project will be installed, and click Create.

  5. From the project homepage, click Go to Flow (or type g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Editor’s journey (for data scientists)#

As a data scientist, your journey begins with creating and configuring graphs. In this section, you’ll learn how to build the foundation of your graph, interact with it, and prepare it for collaboration with your business teams.

Set up the editor webapp#

The first step is to create and configure your editor webapp.

  1. From the top navigation bar, open the code menu (Code icon.) and click Webapps.

  2. Click + New Webapp > Visual Webapp.

  3. Select Visual Graph - Editor.

  4. Name the webapp Patient Drug editor.

  5. Click Create. This opens the Edit tab of the new webapp.

Screenshot editor webapp creation

Warning

If you’re working on a Dataiku Cloud instance, you need to install the Visual Graph plugin from the Store first. For details, see Installing plugins in the reference documentation.

Next, configure the webapp by connecting it to the source datasets that will populate the graph, and by defining internal storage datasets where the graph structure and configurations will be saved. This setup ensures that the webapp can both build new graphs and publish them for others to explore.

  1. In the Source Datasets for Nodes section, add: Patients, Diseases, and Drugs

  2. For Source Datasets for Edges, add: Drugs, DrugContraindications, and PatientDiseases.

  3. Under Visual Graph Editor Internal Datasets, create a dataset of any type, named graph_storage as the Internal Storage Dataset.

  4. For the Saved Configuration Dataset, create a dataset of any type, named graph_config.

  5. Under Connection for Publishing Graphs, select the connection filesystem_folders.

  6. Click Save to finish the configuration.

Note

Optionally, you can enable the AI Assistant by selecting your preferred LLM Connection. This allows the webapp to generate AI-assisted Cypher queries.

This feature requires a connection to at least one supported Generative AI model. Your administrator must configure them beforehand in the Administration panel > Connections > New connection > LLM Mesh. Supported model connections include models such as OpenAI, Hugging Face, Cohere, etc.

Screenshot editor webapp creation

You webapp is now fully configured. Switch to the View tab to start working with it.

Interactive graph creation#

You are now inside the interactive workspace. Create a new graph to highlight drug contraindications.

  1. From the left panel, click Create a new graph.

  2. Click the three dots (Vertical dots icon.) and rename it to Drug Contraindications.

The next step is to configure the graph by adding nodes (the entities) and edges (the relationships).

Create node groups#

Node groups define the entities that will appear in the graph. Create your first node group.

  1. From the left panel, under Node groups, click Add group

  2. Name it Patients.

  3. Select the dataset Patients.

  4. Select patient_id as the column with unique identifiers. It creates a node for each unique value in this column.

  5. Select name as the column with names. This column’s values define the node names.

  6. Add columns with additional properties by selecting the column age. This will attach the age property to the nodes.

  7. Click Add.

first node group creation

Reiterate this process to create two additional groups, following these parameters:

Group name

Source dataset

Node IDs column

Node names column

Additional properties column

Diseases

Diseases

disease_id

name

None

Drugs

Drugs

drug_id

name

used_for_disease

Note

As you add groups, the graph updates in the center of the screen, letting you instantly visualize your nodes. You can also customize groups by changing their color, icon, or size, using the Customize section.

Create edge groups#

With the same logic, let’s define edges groups. They represent how nodes are connected.

Create your first edge group.

  1. From the left panel, next to Edge groups, click Add group

  2. Name it DrugtoDiseases.

  3. As source node group, select the dataset Drugs.

  4. As target node group, select Diseases.

  5. Under Edge source, choose the Drugs dataset.

  6. Select drug_id as the column with source identifiers. This column should contain identifiers of the source node group.

  7. Select used_for_disease as the column with target identifiers. This column should contain identifiers of the target node group.

  8. Let the additional properties field empty and click Add.

first edge group creation

Reiterate these steps to create two additional edge groups, following these parameters:

Group name

Source node group

Target node group

Edge source dataset

Source IDs column

Target IDs column

Additional properties column

DrugContradictions

Drugs

Drugs

DrugContraindications

drug_id_1

drug_id_2

reason

PatientDiseases

Diseases

Patients

PatientDiseases

disease_id

patient_id

None

Note

As with nodes, the graph updates as you add edge groups. This helps you uncover relationships across datasets in real time. You can also adjust the appearance of edges for readability, under the Customize section.

Well done—you’ve just designed your first graph!

entire graph

Try zooming, dragging, and clicking on nodes and edges to explore what you’ve built.

Tip

Remember, this workspace allows you to create multiple graphs within the same project, each designed for a different purpose. For example, you could have one graph to explore drug contraindications, another to analyze disease prevalence, and others for specialized analyses. Take advantage of this flexibility to organize your insights efficiently.

Analyze the graph#

Now that you’ve designed the graph, you can begin analyzing it using the query panel.

Run Cypher queries#

To answer specific business questions, run Cypher queries and save them so business analysts (explorers) can easily reuse them.

Try an example. The goal is to identify drugs used for depression and check their contraindications. To do so:

  1. From the bottom panel, open the New query tab.

  2. Paste the following Cypher query:

MATCH (d:Drugs)-[r1:DrugtoDiseases]->(dis:Diseases)
WHERE LOWER(dis.name) = LOWER("Depression")
OPTIONAL MATCH (d)-[r2:DrugContradictions]->(d2:Drugs)
RETURN d, r1, dis, r2, d2
  1. Click Execute to run the query. The graph will update to show the results.

  2. Name it Contraindications of depression drugs.

  3. Click Save to make it available for business analysts.

Query result

The results highlight that Sertraline and Escitalopram (used for depression) shouldn’t be taken together, and Sertraline has a contraindication with Lorazepam.

To dig deeper into the results, try the following:

  1. Click the edge between Sertraline and Lorazepam to see the specific issues that occur if these drugs are prescribed together.

  2. Click the Lorazepam node to open its popup. From there, go to the Neighbors tab and select Expand 2 remaining neighbors. This reveals that Lorazepam is used to treat Anxiety, along with another contraindication.

With this additional context, you see that while Sertraline is contraindicated with Lorazepam, a patient suffering from both depression and anxiety could safely take Escitalopram together with Lorazepam.

Next, run the following additional queries and save each one:

  • Patients with diabetes:

    MATCH (d:Diseases)-[e:PatientDiseases]->(p:Patients)
    WHERE LOWER(d.name) = LOWER("diabetes")
    RETURN d, e, p
    
  • Number of patients with multiple diseases:

    MATCH (d:Diseases)-[r:PatientDiseases]->(p:Patients)
    WITH p, COUNT(d) AS disease_count
    WHERE disease_count > 1
    RETURN COUNT(p) AS number_of_patients
    

    Note

    Since this query returns a numeric value, the results automatically display in the Table view of the webapp.

Saving these queries helps you build a reusable queries library for exploration and analysis by your team.

Tip

Need help writing or understanding queries? You can access the official Cypher documentation at any time by clicking the Cypher doc link (External link icon.) in the bottom-right corner.

Optional: Use the AI assistant#

Instead of writing queries manually, you can optionally use the AI assistant (if enabled) to generate Cypher queries from natural language prompts:

  1. Open the Query generator tab.

  2. Enter your prompt describing the analysis you want to run.

  3. The AI assistant will generate a Cypher query and display the results.

Query generator

Publish the graph#

Once you’re satisfied with your graph and have saved useful queries, the final step is to publish the graph so that it can be reused and explored by others.

Export configurations#

Start by saving your graph configurations.

  1. From the left panel, under Saved configurations, click Save current configuration.

  2. Let the current name Drug Contraindications.

  3. Click Save configuration.

export

Build and deploy#

Now that your graph configuration is ready, publish it using the Build graph plugin recipe.

Normally, after clicking Save configuration, a popup appears automatically. Confirm the settings and click Publish. This deploys your graph directly into the Flow and makes it accessible in the Explorer webapp.

Note

If the popup doesn’t appear, or if you want to publish the graph later, follow these instructions:

  1. Under Saved configurations, locate the graph configuration you just saved, and click the publish icon.

  2. Confirm by clicking Publish.

build settings

Tip

Alternatively, you can publish a graph manually by going to the Flow (g + f), selecting the Build graph recipe, configuring the input datasets and saved configuration, and then running the recipe. This method gives you more control over the publication settings.

Your graph is now published:

  • It’s ready to be explored by non-technical users in the Explorer webapp.

  • It’s also available in the Flow for further analysis or integration with other recipes.

Built graph in the flow

Set up the explorer webapp#

You’ve just built and published your graph. The last step is to prepare the Explorer webapp so analysts can interact with the published graph.

  1. From the top navigation bar, under the code menu (Code icon.), click Webapps.

  2. Click on + New Webapp > Visual Webapp.

  3. Select Visual Graph - Explorer.

  4. Name it Patient Drug explorer.

  5. Click Create. This will bring you to the Edit tab of the webapp.

  6. Select your Drug Contraindications folder as the Folder containing the graph databases.

  7. (Optional) If needed, set up a LLM connection to use. This would allow explorers to generate their own Cypher queries using prompts.

  8. Click Save.

Explorer webapp config

Go to the View tab and make sure your Drug Contraindications graph appears in the list. Your explorer webapp is now ready for interactive exploration.

Advanced analytics#

The graph you’ve built is more than just a visualization—it’s a source of structured insights that you can explore and analyze further in the Flow.

Once your graph is ready, you can try out different ways to gain deeper understanding:

  • Run graph algorithms such as PageRank (using the Compute PageRank recipe) to identify the most influential drugs, diseases, or patients in your network.

  • Interact with your graph using agents to ask questions in natural language and explore insights directly from your data.

    Example agent
  • Leverage your graph in machine learning workflows, for example to generate features or enrich datasets used in predictive models.

  • Combine multiple queries and saved configurations to create a library of analyses that can be reused for reporting or further exploration.

Think of this section as an invitation to explore and experiment the tools and recipes provided by the Visual Graph plugin. Even without writing code, you can uncover patterns, test hypotheses, and see how your graph can support more advanced analytics.

Explorer’s journey (for non-technical users)#

Now that the data scientist has created and published the graph, the explorer’s journey begins. This section will guide you through exploring the published graph to answer business questions without writing any code.

Important

If you’re an analyst and have skipped the editor’s journey instructions, you can still join in. Download this prepared project and import it directly into your Dataiku instance by clicking + New Project > Import project. This will allow you to start with the published graph and follow only the explorer’s journey.

Access the explorer webapp#

Your first step is to open the Explorer webapp. This will be your workspace for graphs exploration.

  1. From the top navigation bar, under the code menu (Code icon.), click on Webapps.

  2. Open the Patient Drug explorer webapp. You should see one available graph named Drug Contraindications.

  3. Click the graph to enter in the visualization area.

Access explorer webapp

You’ll see the graph in the center of the screen.

Interact with the graph#

On the left panel, you can see the nodes and edges that were previously defined by the data scientists. Node groups represent entities such as patients, drugs, or diseases. Edge groups represent relationships between these entities, like which drugs are prescribed to a patient or which drugs have contraindications with each other.

You can interactively explore the network by:

  • Zooming in and out.

  • Clicking on individual nodes to view their properties (for example patient details, disease connections, prescribed drugs).

  • Clicking on edges to inspect relationships.

For example, clicking on a patient’s node will show you their information, as well as their connections to diseases and prescribed drugs.

Interact with the graph

Try to explore the graph and click around—this is the best way to uncover insights and understand the data.

Tip

To make exploration easier, you can temporarily hide nodes or edges by clicking the eye icons (View impact icon.) next to their group.

Run queries#

Beyond simply exploring the graph visually, you can also query it to answer specific business questions. The explorer webapp offers different ways to do this, ranging from ready-to-use queries to fully custom ones.

Explore pre-built queries#

The explorer webapp includes pre-built Cypher queries that the data scientist has created and saved for you. These queries let you quickly answer common questions without writing any code.

To run a query:

  1. On the bottom panel, open the Saved queries tab.

  2. Run the query you want to execute (Play button icon.).

The graph will update to show only the relevant nodes and edges, helping you focus on the information you need.

Interact with the graph

Try running the available queries and explore the results to see how the graph responds.

Tip

To dig deeper into results, you can click on nodes to see more information and use the Neighbors tab to expand connected nodes. You can also click on edges to inspect the relationships between nodes.

Note

Examples of pre-built queries include:

  • Count the number of patients with multiple diseases.

  • Analyze drug contraindications in depression treatment.

  • Identify patients with diabetes.

Create new queries#

In addition to these pre-built queries, you can also create your own at any time:

  • If you are familiar with Cypher, write your own queries directly in the New query tab.

  • If enabled by the data scientists, use the AI Assistant under the Query generator tab to generate queries from natural language prompts.

For example, you could ask: Show patients with migraine and their prescribed drugs.

The assistant can then translate this into a Cypher query and run it against the graph.

Explore the graph

Tip

If the AI assistant isn’t available, you could use your own external LLM (such as ChatGPT) to craft Cypher queries. Simply copy the generated query into the New query tab and click Execute.

Continue exploring by writing queries yourself or using an LLM to answer different business questions.

Next steps#

Congratulations! You’ve successfully created and explored a graph with the Visual Graph plugin. Feel free to continue refining it and exploring other possible settings to meet your use cases.

See also

For more information on the Visual Graph plugin, see the reference documentation.