Product Pillar: Inclusive Advanced Analytics

The pillar of Inclusive Advanced Analytics seeks to make AI more widespread and relevant, through access to a wider population within the enterprise.

../../../_images/01_GRAPHIC_PILLARS_inclusive-advanced.png

The ongoing transformation to AI happening within organizations cannot be limited to data scientists. Throughout the entire analytics lifecycle, personas across the enterprise need to collaborate on a shared platform. This kind of inclusive work leads to both better oversight across complementary teams and more trusted results across the organization. A common workspace fosters gradual augmentation of skills for all profiles and mitigates the risks of brutal implementation of AI solutions designed by experts without buy-in from all stakeholders.

This vision for inclusive advanced analytics manifests itself in the following feature categories:

  • A unified visual abstraction

  • A skill-agnostic platform

  • Extensibility through plugins

  • Discussions and wikis for team collaboration

A Unified Visual Abstraction

At the core of Dataiku DSS is the idea of a unified visual abstraction for the analytics pipeline, termed the Flow.

The Flow is a single, visual representation of work in a DSS project as a set of dependencies between datasets and the recipes (steps for data transformation) used to produce them. This unified abstraction allows for a clearly mapped data process and transformation, from raw data to prediction and other outputs.

The clear mapping of the Flow tracks the lineage and dependencies of all project objects — datasets, recipes, models, etc. Accordingly, DSS is able to dynamically and intelligently rebuild outputs in the Flow whenever one of their parent datasets or recipes have been modified. This makes the Flow reusable, maintainable, and portable to various execution engines.

../../../_images/intro-flow.png

The visual grammar of DSS can be seen in the Flow of the NY Taxi Fares project:

  • Blue squares represent datasets.

  • Yellow circles represent visual recipes.

  • Orange circles represent code recipes.

  • Red circles represent plugin recipes.

  • Green circles and diamonds represent machine learning recipes and models.

The icon within a particular shape communicates more information about the object, such as the storage of the dataset, the type of visual recipe, or the language of the code recipe.

Recipes are explained in greater detail in later courses, but can be understood as a repeatable set of actions for manipulating and transforming datasets.

A Skill-Agnostic Platform

Scaling AI requires ingraining a culture of working with data throughout the enterprise instead of siloing it into a specific team or role. For this reason, Dataiku DSS is a skill-agnostic platform that provides a common ground where personas across the enterprise can effectively collaborate.

For individuals with a preference for visual tools, DSS provides a visual user interface for the entire AI lifecycle. Individuals who prefer to write code also have a first class environment for doing so. Moreover, coders in the enterprise are free to extend and customize the visual capabilities of DSS whenever necessary, making their non-coding colleagues more productive.

This enablement of all personas, regardless of their skill set and preferences, gives teams the resources to work faster and smarter for a more data-driven organization.

../../../_images/intro-lab.png

The Lab is a place within DSS for experimentation. On the left are options for clickers - users who prefer to accomplish their tasks clicking in a visual UI. On the right are options for coders.

Extensibility through Plugins

DSS has been designed with extensibility as a core tenet.

Coders within an enterprise can augment the native capabilities of DSS through a plugin system to make all of their colleagues more productive. This system allows coders to package specific, in-house processes into click-based reusable components.

These custom extensions can be written by end users or contributed by the broader Dataiku community. They can be found in the Plugin Store, uploaded from a zip file, or fetched from a git repository.

Plugins can include components such as recipes, datasets, or even web apps. Accordingly, they can be written in languages such as Python and R, or HTML/CSS/JS.

../../../_images/intro-geocoder-plugin.png

In addition to predicting fares, this project also experiments with geocoding and time series plugins for a fare revenue forecast. Here the Geocoder plugin fetches geographic coordinates for an address from the provider of our choice (in this case ArcGIS). Using a simple visual interface, now every parking garage in the dataset has longitude and latitude coordinates.

Another example of an extensible component that can be created with the plugin system are macros. Macros are predefined actions that allow users to automate tasks. With a macro, repetitive coding tasks are compiled and simplified into a visual UI. Such tasks include maintenance and diagnostic tasks, connectivity tasks for import of data, or generation of reports.

Some macros are provided as part of DSS, but enterprises can also develop their own macros in Python for specific purposes. The right to use a macro can also be limited to specific groups of users, such as administrators.

../../../_images/intro-macros.png

The macros page for the NY Taxi Fares project provides a simple UI for diverse tasks like deleting datasets by a tag filter or generating new access tokens to interact with the Power BI API.

Discussions & Wikis for Team Collaboration

Open communication and good documentation of activity is an important component of collaboration.

Within a DSS project, users can initiate Discussions around any object, such as a particular dataset, recipe, model, or dashboard. These same objects can have descriptions and to-do lists attached to them. Users can also star objects and control notification settings to be alerted of certain project activities. These messages and notifications can be managed from a centralized inbox.

../../../_images/intro-discussion.png

The discussion here is attached to the project homepage. Tagging colleagues directly alerts them to the conversation.

For project documentation, DSS supports centralized and hierarchical markdown-based Wiki pages to ensure every team member has a high-level understanding of the project’s existing work. Sample templates can be created to make the job of documentation easier. Wikis can also reference DSS objects.

../../../_images/intro-wiki.png

The wiki for the NY Taxi Fares project includes a Project README and a collection of additional project resources.