Quick Start | Dataiku for AI collaboration#

Get started#

Recent advancements in generative AI have made it easy to apply for jobs. But be careful! Scammers have also been known to create fake job applications in the hopes of stealing personal information. Let’s see if you — with Dataiku’s help — can spot a real job posting from a fake one!

Objectives#

Rather than designing new elements like in the quick starts for data preparation, machine learning, or MLOps, this quick start focuses on how to collaborate with colleagues and use the AI capabilities they have already created as inputs for your own objectives.

In this quick start, you’ll:

  • Understand a project’s objectives by reviewing the Flow.

  • Recognize how group assignments impact project security.

  • Communicate insights with a dashboard.

  • Run a colleague’s workload by using both an automation scenario and a Dataiku application.

Note

All capabilities featured in this quick start can be completed with the Explorer user profile.

Tip

To check your work, you can review a completed version of this entire project from data preparation through MLOps on the Dataiku gallery.

Create an account#

To follow along with the steps in this tutorial, you need access to a 12.6+ Dataiku instance. If you do not already have access, you can get started in one of two ways:

  • Start a 14 day free trial. See this how-to for help if needed.

  • Install the free edition locally for your operating system.

Open Dataiku#

The first step is getting to the homepage of your Dataiku Design node.

  1. Go to the Launchpad.

  2. Click Open Instance in the Design node tile of the Overview panel once your instance has powered up.

  3. See this how-to if you encounter any difficulties.

Important

If using a self-managed version of Dataiku, including the locally-downloaded free edition on Mac or Windows, open the Dataiku Design node directly in your browser.

Create the project#

Once you are on the Design node homepage, you can create the tutorial project.

  1. From the Dataiku Design homepage, click + New Project.

  2. Click DSS tutorials in the dropdown menu.

  3. In the dialog, click Quick Starts on the left hand panel.

  4. Choose AI Collaboration Quick Start, and then click OK.

Dataiku screenshot of the dialog for creating a new project.

Note

You can also download the starter project from this website and import it as a zip file.

Are you using an Explorer profile?

Explorer profiles do not include the permission to create a new project. However, as a Designer on a trial or free edition, you’ll be able to do this on your own!

If using an Explorer profile, have your instance administrator follow the steps below so you can complete the quick start:

  1. Create the project above.

  2. Build the Flow.

  3. Assume the role of the Score Data scenario’s last author by making an arbitrary change to the scenario (such as to the trigger) and saving it.

  4. Grant you permission to access the project.

Review the Flow#

See a screencast covering this section’s steps

One of the first concepts a user needs to understand about Dataiku is the Flow. The Flow is the visual representation of how datasets, recipes (steps for data transformation), and models work together to move data through an analytics pipeline.

See the Flow’s visual grammar#

Dataiku has its own visual grammar to organize AI, machine learning, and analytics projects in a collaborative way.

Shape

Item

Icon

Dataset icon.

Dataset

The icon on the square represents the dataset’s storage location, such as Amazon S3, Snowflake, PostgreSQL, etc.

Recipe icon.

Recipe

The icon on the circle represents the type of data transformation, such as a broom for a Prepare recipe or coiled snakes for a Python recipe.

Dataset icon.

Model

The icon on the diamond represents the type of modeling task, such as prediction, clustering, time series forecasting, etc.

Tip

In addition to shape, color has meaning too.

  • Datasets are blue, but those shared from other projects are black.

  • Visual recipes are yellow. Code recipes are orange. LLM recipes are pink. Plugin recipes are red.

  • Machine learning elements are green.

Take a look at the items in the Flow now!

  1. If not already there, from the left-most menu in the top navigation bar, click on the Flow (or use the keyboard shortcut g + f).

Dataiku screenshot of the starting Flow for AI collaboration.

Tip

There are many other keyboard shortcuts beyond g + f. Type ? to pull up a menu or see the Accessibility page in the reference documentation.

Use the right panel to review an item’s details#

To collaborate on a project, you’ll need to quickly get up to speed on what someone else’s Flow accomplishes. Let’s try to figure out the purpose of this one.

  1. Click once on the job_postings dataset to select it.

  2. Click to open the Details icon. Details icon to learn more about this item.

  3. Click on the Schema tab underneath to see its columns.

  4. Click on the test_scored dataset at the end of the pipeline, and review the same tabs. Note the addition of a prediction column.

  5. Review the recipes that transform job_postings to test_scored beginning with the Prepare recipe at the start of the pipeline. Click once to select each one, and review the Details tabs to help determine what they do.

Dataiku screenshot of the details tab of an item in the Flow.

Note

The model in this project happens to be a simple AutoML model. However, you can think of it as a placeholder for any kind of model — not only those built in Dataiku, but also custom models imported into Dataiku.

You could read the project’s wiki (use the keyboard shortcut g + w) for more information, but from just browsing the Flow, you probably already have a good idea of what this project does. The pipeline prepares some data and builds a prediction model in order to classify a job posting as real or fake.

The readability of the Flow eases the challenge of bringing users of diverse skill sets and responsibilities onto the same platform. For example:

  • The Flow has visual recipes (in yellow) that can be understood by all, but also custom code (in orange).

  • The Flow is divided into two interconnected Flow zones, which can be useful for teams focused on different stages of a project.

Dataiku screenshot highlighting the readability of the Flow.

Build the Flow#

Unlike the initial uploaded datasets, the downstream datasets appear as outlines. This is because they have not been built, meaning that the relevant recipes have not been run to populate these datasets. However, this is not a problem because the Flow contains the recipes required to create these outputs at any time.

  1. Click to open the Flow Actions menu in the bottom right.

  2. Click Build all.

  3. Leaving the default options, click Build to run the recipes necessary to create the items furthest downstream.

  4. When the job completes, refresh the page to see the built Flow.

Dataiku screenshot of the dialog for building the Flow.

Collaborate in real-time through a browser#

See a screencast covering this section’s steps

Now that you have built the project, you might want to get straight to work. However, let’s take a moment to review a few collaboration principles.

Work in a browser#

One point not to be overlooked is that you access Dataiku through a web browser (rather than say, for example, a desktop application). This has a number of advantages:

  • You can work with large datasets in a secure and governed way.

  • You can avoid lost time for data transfer across networks.

  • You can better track a project’s version history and user contributions.

Understand the groups-based permission framework#

A browser-based tool also enables a groups-based permission framework. Start by recognizing some basic details about your account.

  1. In the top right corner, click on the Profile icon.

  2. Click the gear icon to open Profile and settings.

  3. Find your user profile and the groups to which you belong.

Dataiku screenshot of the profile and settings page.

Common user profiles include designer, explorer, and reader. As an example, if on a free trial, your profile will be designer, and you’ll be a member of the designers and space_administrators groups.

Based on your group membership, you may have projects or workspaces shared with you, and permissions set for what you can do in these items (such as writing project content or only reading project content).

Assuming you created the job postings project yourself, you’ll be able to view the project’s security settings. These settings include information such as the project owner and the specific project permissions granted to each group or user.

  1. Return to the project (for example, using the back arrow in your browser).

  2. From the top navigation bar, go to the menu (More Options).

  3. Click on Security to view the permissions matrix for the project.

Dataiku screenshot of the permissions matrix.

Tip

Users can invite a colleague to their space from the Users, Profiles & Groups panel of their Launchpad.

Self-managed Dataiku users with the appropriate permissions can do the same from the Administration > Security > Users panel. Then, grant this user access to your project from the above Permissions panel of the Project security page!

Communicate with colleagues#

Once you have your colleagues on the same instance space, you’ll be able to collaborate in real-time.

  1. Start discussions on objects, such as from the Discussions tab of the right sidebar.

    Dataiku screenshot of the discussions tab of an item.
  2. Manage requests and review discussions from your Inbox found in the Applications menu in the top right of the navigation bar.

    Dataiku screenshot of the inbox.

Share insights with a dashboard#

See a screencast covering this section’s steps

Once you start collaborating, you’ll need a way to communicate insights.

Dashboards are a key tool in Dataiku for sharing insights such as charts, custom webapps, metrics, machine learning model results, etc. They are particularly well suited for situations where different users on the project may require different levels of permission.

View the project’s dashboard#

Let’s start by exploring what’s already been created in the project’s dashboard.

  1. From the project’s top navigation bar, go to the Dashboards page (or use the keyboard shortcut g + p).

  2. Click to open the Project dashboard.

  3. Observe the insights about the model found in the project’s Flow.

Dataiku screenshot of the View tab of a dashboard.

Create an insight on a dashboard#

Let’s add more information to this dashboard.

  1. Click Edit.

  2. Click the + to add a new page.

  3. Click + New Tile to add to the dashboard.

  4. Choose Chart from the available dashboard tiles.

  5. Open the dropdown menu, and choose job_postings_prepared_joined as the source dataset.

  6. Click Add.

Dataiku screenshot of the dialog for adding a chart insight to a dashboard.

Let’s create a chart insight showing the count of records by country and the target variable fraudulent.

  1. From the available columns, drag Count of records to the Y axis.

  2. Drag country to the X axis.

  3. Drag fraudulent to the color droplet field.

  4. Click the dropdown next to fraudulent, and change the binning mode to None, use raw values.

  5. Open the chart picker dropdown, and select Vertical stacked bars.

  6. Click Save (or use the keyboard shortcut Cmd/Ctrl + s).

  7. Click Back to dashboard. You’ll see your new chart there.

Dataiku screenshot of a chart insight.

Important

As reflected by the tag, the current insight is calculated on a sample of the data. Sampling ensures you can work interactively on even very large datasets. You can adjust this in the Sampling & Engine tab.

Now adjust the size of the insight on the dashboard page.

  1. Click on the chart tile, and drag the corners to a larger size (e.g. 9 x 6 squares).

  2. Click Save (or use the keyboard shortcut Cmd/Ctrl + s).

  3. Click View to see the current state of the dashboard.

Dataiku screenshot of the Edit tab of an insight.

Tip

Feel free to add more tile insights to this dashboard or learn more about dashboards in the Knowledge Base.

Run a colleague’s workload using an automation scenario#

See a screencast covering this section’s steps

In addition to a being a place for communicating insights, dashboards can also be a tool to interact with project elements that colleagues have created.

For example, a team member may have embedded a custom webapp on a dashboard. You can use the webapp’s functionality through the dashboard.

Another good example of this pattern is scenarios. In Dataiku, scenarios are a set of actions to run, along with conditions for when they should execute. Although scenarios can trigger automatically based on factors like time or dataset changes, you can also trigger them manually — including from a dashboard.

This could be helpful for tasks such as:

  • Refresh the Flow with the latest batch of data.

  • Export objects like dashboards, notebooks, reports, wikis, and other kinds of documentation.

  • Execute some SQL or Python code.

Note

See the Quick Start | Dataiku for MLOps for a walkthrough of your first scenario.

View a scenario#

As an example, take a look at the scenario in the project.

  1. From the Jobs menu in the top navigation bar, click on Scenarios.

  2. Click Score Data to open the scenario.

  3. Navigate to the Steps tab.

  4. Click on the steps to see what actions will run when the scenario is triggered.

Dataiku screenshot of the Steps tab of a scenario.

In many cases, you may not need to know all details of a colleague’s scenario, but this one is easy to understand. It rebuilds the furthest downstream test_scored dataset (and any necessary upstream dependencies), but only if the data quality rules on the upstream job_postings_prepared dataset pass verification.

Caution

If you created the project yourself, you’ll need to become the scenario’s last author in order to run it. To do this, make an arbitrary change in the scenario, and then save it. For example, on the Settings tab, change something about the trigger (which won’t be used anyway).

If another user shared the project with you, (for example you may be an Explorer), this user needs to have made this arbitrary change to become the scenario’s last author in order for you to execute the scenario on their behalf.

Add a scenario tile to a dashboard#

To have a convenient way of triggering a scenario, you can add a tile for the scenario to a dashboard page.

  1. Navigate back to the Project dashboard (g + p).

  2. Click Edit.

  3. Go to Page 2.

  4. Click + New Tile.

  5. Choose Scenario.

  6. In the dialog, choose the Run button option.

  7. Open the Source scenario dropdown, and select Score Data.

  8. Click Add.

  9. Adjust the size of the tile by dragging the corners, and click Save (Cmd/Ctrl + s).

Dataiku screenshot of the dialog for adding a scenario to a dashboard.

Run a scenario from a dashboard#

Now let’s trigger the scenario!

  1. Click View from the dashboard.

  2. Click Run Now to trigger the Score Data scenario.

  3. Click Logs on the pop-up notification or, beginning from the top navigation bar, go to Scenarios > Score Data > Last runs.

  4. Click on the job log for the build step to view the output.

Dataiku screenshot of the last runs tab of a scenario.

Tip

Can you see why there was “nothing to do” for the build step? The short answer is build modes! Contrast this outcome with what happens when completing the next section!

Run a colleague’s workload using a Dataiku application#

See a screencast covering this section’s steps

Having a scenario run button on a dashboard simplifies the complexity for end users. But for many situations, end users may not even need access to the original project!

Dataiku applications enable users to package a project as a reusable application and share it with an audience of end users, such as business analysts. These end users can create their own instances of the application, and use it to complete their tasks without ever seeing the original project.

Your project already has been packaged as a Dataiku application. This application allows users to upload a dataset, apply the model to the uploaded data, and download the predictions — using the same scenario you ran above.

View the Dataiku application designer#

Typically, as an end user of a Dataiku application, you won’t need to see the originating project. However, assuming you created this project, you can take a look! Skip ahead if you are on an Explorer profile.

  1. From the top navigation bar, navigate to the menu (More options).

  2. Choose Application Designer.

  3. Scroll through the application, and try to understand how it can be used.

Dataiku screenshot of a Dataiku application from the Application designer.

Create an instance of a Dataiku application#

Similar to creating your own copy of the starter project for this tutorial, you need to create your own copy (or instance) of the Dataiku application.

  1. From the top navigation bar, open the waffle menu, and select Dataiku Applications.

  2. Click on the Score Job Postings application.

  3. Click Create App Instance.

  4. Give it a unique name, such as MYNAME SCORE JOBS.

  5. Click Create.

Dataiku screenshot of the dialog for creating an instance of a Dataiku application.

Use a Dataiku application#

The first field asks you to upload a dataset to be scored by the model. For this example, let’s use an export of the original job_postings dataset, but filtered for jobs in New Zealand.

  1. Download the nz_job_postings.csv file.

    Once you have a file to upload, you can use the application to produce a batch of model predictions.

  2. In the Upload data to be scored tile of your Dataiku application instance, click Add a File, and select the nz_job_postings.csv file.

  3. In the Generate predictions tile, click Run Now.

  4. When the run is finished, in the Download predictions tile, click Download.

Dataiku screenshot of a Dataiku application instance.

Without ever seeing the original project, clicking Run Now triggered the same Score Data scenario that you ran from the dashboard in the previous section. This time, however, Dataiku detected a new upstream dataset in place of the job_postings dataset. Therefore, the scenario had actual work to do!

Tip

Import the file you downloaded into Dataiku or any other data tool to confirm that it is indeed the same as the test_scored dataset found in the Flow — but including only results from New Zealand!

What’s next?#

Congratulations! You’ve taken your first steps toward AI collaboration with Dataiku.

If you are interested in learning more about Designer capabilities, please see the quick starts for data preparation, machine learning, or MLOps.

More generally, visit the Dataiku Academy for learning paths and certifications, the majority of which can be completed using the free trial or free edition.

Note

You can also find more Dataiku resources in the following spaces: