Understand the project#

See a screencast covering this section’s steps

First take a moment to understand the goals for this quick start and the data at hand.

Objectives#

Rather than designing new elements like in the other task-based quick starts, this one focuses on how to collaborate with colleagues and use the AI capabilities they have already created as inputs for your own objectives.

In this quick start, you’ll:

  • Understand a project’s objectives by reviewing the Flow.

  • Recognize how group assignments impact project security.

  • Communicate insights with a dashboard.

  • Run a colleague’s workload by using both an automation scenario and a Dataiku app.

Note

Users with AI Consumer user profiles can perform all capabilities featured in this quick start.

Tip

To check your work, you can review a completed version of this entire project from data preparation through MLOps on the Dataiku gallery.

Review the Flow#

One of the first concepts a user needs to understand about Dataiku is the Flow. The Flow is the visual representation of how datasets, recipes (steps for data transformation), models, and agents work together to move data through an analytics pipeline.

Dataiku has its own visual grammar to organize AI and analytics projects in a collaborative way.

Shape

Item

Icon

Dataset icon.

Dataset

The icon on the square represents the dataset’s storage location, such as Amazon S3, Snowflake, PostgreSQL, etc.

Recipe icon.

Recipe

The icon on the circle represents the type of data transformation, such as a broom for a Prepare recipe or coiled snakes for a Python recipe.

Model icon.

Model or Agent

The icon on a diamond represents the type of modeling task (such as prediction, clustering, time series forecasting, etc.) or the type of agent (such as visual or code).

Tip

In addition to shape, color has meaning too.

  • Datasets are blue. Those shared from other projects are black.

  • Visual recipes are yellow. Code recipes are orange. Plugin recipes are red.

  • Machine learning elements are green.

  • Generative AI elements are pink.

Take a look at the items in the Flow now!

  1. If not already there, from the (Flow icon.) menu in the top navigation bar, select the Flow (or use the keyboard shortcut g + f).

Dataiku screenshot of the starting Flow for AI collaboration.

Tip

There are many other keyboard shortcuts beyond g + f. Type ? to pull up a menu or see the Accessibility page in the reference documentation.

Use the right panel to review an item’s details#

To collaborate on a project, you’ll need to quickly get up to speed on what someone else’s Flow accomplishes. Try to figure out the purpose of this one.

  1. At the far left of the Flow, click once on the job_postings dataset to select it.

  2. Click the Details (Details icon.) icon in the right panel to review its metadata.

  3. Click the Schema (Schema icon.) icon underneath to see its columns.

  4. Click Preview (or use the keyboard shortcut shift + p) to pull up the first few rows of data.

  5. Beginning with the Prepare recipe at the start of the pipeline, review the recipes that transform the job_postings dataset at the far left to the jobs_sampled dataset at the far right. Click once to select each one, and review the Details tab to help determine what they do.

Dataiku screenshot of the details tab of an item in the Flow.

See also

For more details on what’s found in the Flow itself, see the other task-based quick starts.

You could read the project’s wiki (use the keyboard shortcut g + w) for more information. However, from just browsing the Flow, you probably already have a good idea of what this project does:

  • The pipeline starts by preparing some data.

  • It then builds a prediction model to classify a job posting as real or fake.

  • The final zone is the starting point for applying large language models (LLMs) on this data.

The readability of the Flow eases the challenge of bringing users of diverse skill sets and responsibilities onto the same platform. For example:

  • The Flow has visual recipes (in yellow) that everyone can understand, but also custom code (in orange).

  • Interconnected Flow zones divide the Flow. Such zones can be useful for organizing different stages of a project.

Dataiku screenshot highlighting the readability of the Flow.

Build the Flow#

Unlike the initial uploaded datasets, the downstream datasets appear as outlines. This is because no one has built them. In other words, no one has run the relevant recipes to populate these datasets. However, this isn’t a problem because the Flow contains the recipes required to create these outputs at any time.

  1. Open the Flow Actions menu.

  2. Click Build all.

  3. Leaving the default options, click Build to run the recipes necessary to create the items furthest downstream.

    Dataiku screenshot of the dialog for building the Flow.
  4. When the job completes, refresh the page to see the built Flow.