Tutorial | Create a scenario for automating project maintenance macros#

Introduction and definitions#

Project cleaning and maintenance can quickly become repetitive tasks. Built-in maintenance macros allow you to automate these tasks across one or all projects on your instance. You automate project cleaning and maintenance across a Dataiku DSS (DSS) instance by creating an Admin Project.

At the end of this article, you’ll be able to perform the following tasks:

  • Create an Admin Project and run maintenance macros.

  • Understand the DSS project cleaning and maintenance macros and be able to describe them.

  • Create a scenario to automate cleaning and maintenance tasks.

What is an Admin Project?

An Admin Project is a blank project you create that is accessible only to admins on the instance. An Admin Project contains a scenario with one step for each macro you want to execute.

After following the steps described in this article, your Admin Project will look like this:

Automation scenario with steps to execute maintenance macros on all projects on the instance.

Create the admin project#

To begin, let’s start with a new, blank project. You’ll need to create the project while signed in to the instance as a user with administrator privileges. This will become more obvious when you are creating steps in your scenario. Without administrator privileges, you will not have the option to apply macros to all projects on the instance.

  1. In your DSS instance, create a new, blank project.

  2. Give it a name like, Admin Project.

Next, we’ll set the project visibility and permissions:

  1. From the top navigation bar in your project, navigate to the More Options (Horizontal dots icon.) menu and choose Security.

  2. Set the Project visibility to Private.

  3. Set Project access requests to Disabled.

We’ll want administrators to have access to this project, so let’s grant access to the administrators group.

  1. Choose administrators and click Grant Access to Group.

  2. Select Admin to give administrators Admin permissions.

    Project security settings for a project.
  3. Save your changes.

Our next step is to configure a scenario to run our macros.

Automate project cleaning and maintenance#

You can run macros manually or automatically from a scenario step. In this section, we’ll create a scenario to execute five maintenance macros. These five macros are recommended as part of project maintenance best practices:

  • Clear job logs

  • Clear scenario run logs

  • Kill Jupyter sessions

  • Clear internal database (if you are using initial internal database h2)

    Note

    For production environments, the use of an externally hosted PostgreSQL runtime database is recommended. Visit Externally hosting runtime databases for more information.

  • Clear continuous activities

    Note

    Clearing continuous activities is recommended if you are using streaming features.

Let’s configure a scenario to run the recommended macros.

Create a scenario#

To create your scenario:

  1. From the Jobs (Play button icon.) menu, navigate to the Scenarios panel, and create a new scenario.

  2. Ensure Sequence of steps is selected, and name it Maintenance Scenario.

Add a trigger#

Let’s create a time-based trigger so that our scenario runs each hour.

  1. Within the Triggers panel of the Settings tab, click the Add Trigger dropdown menu.

  2. Add a Time-based trigger.

  3. Instead of the default Time-based, name it Every 1 hour.

  4. Change Repeat every to 1 hours.

  5. Make sure its activity status is toggled to ON.

Scenario with a time-based trigger.

Next, we’ll add steps, one for each macro we want to run. We’ll configure each step to execute the macro on all projects on the instance.

Add a step to clear job logs#

Our first step will run a macro to clear job logs.

  1. Navigate to the Steps tab.

  2. From the Add Step dropdown, choose to add an Execute macro step.

  3. Name it Clear job logs then select Clear job logs as the macro.

  4. Set the Max age (days) to 9.

  5. Select Perform deletion.

  6. Select All projects.

    This tells DSS to delete logs older than nine days for all projects on the instance.

    Maintenance scenario with a step to run a macro for clearing job logs.
  7. Save your changes.

Add a step to clear scenario run logs#

This step will clear scenario run logs.

  1. From the Add Step dropdown, choose to add an Execute macro step.

  2. Name it Clear scenario run logs then select Clear scenario run logs as the macro.

  3. Set the Max age (days) to 9.

  4. Select Perform deletion.

  5. Select All projects.

This tells DSS to delete all logs and temporary files of scenario runs that are older than nine days.

Add a step to kill Jupyter sessions#

Let’s add a step to kill Jupyter sessions to free up some memory on the instance when Jupyter sessions have been running for too long or when they have been idle for too long.

  1. From the Add Step dropdown, choose to add an Execute macro step.

  2. Name it Kill Jupyter sessions then select Kill Jupyter sessions as the macro.

  3. Leave the default settings.

This tells DSS to delete old and unused Jupyter sessions.

Add a step to clear internal databases#

Adding a step to clear internal databases can help resolve performance degradation.

  1. From the Add Step dropdown, choose to add an Execute macro step.

  2. Name it Clear internal databases then select Clear internal databases as the macro.

  3. Select Clear for all projects.

  4. Set Max age to 9.

This tells DSS to truncate jobs, scenarios and metrics histories for all projects.

Add a step to clear continuous activities logs#

When working with continuous activities such as streaming features, the continuous activities logs and temporary files can grow very quickly. Therefore, we might want to delete the logs and temporary files older than a certain number of days.

  1. From the Add Step dropdown, choose to add an Execute macro step.

  2. Name it Clear continuous activities logs then select Clear continuous activities logs as the macro.

  3. Set the Max age (days) to 9.

  4. Select Perform deletion.

  5. Select All projects.

    This tells DSS to delete all logs and temporary files of continuous activities older than nine days.

  6. Save your changes.

Trigger the scenario to run as admin#

Now that we have configured our scenario, let’s set it to run automatically by enabling the auto trigger. We’ll also tell DSS to run the scenario as admin. Admin is required for running maintenance macros on each project on the instance.

  1. Within the Run panel of the Settings tab, toggle the Auto-triggers to ON.

  2. Set the Run as option to admin.

  3. Leave the default settings.

  4. Save your changes.

You can now run the scenario.

Scenario configured to auto trigger and run as admin.