Product Pillar: Sustainable Governance & Processes

The pillar of Sustainable Governance & Processes seeks to ensure all data initiatives are properly governed from security, auditability, and processes perspectives.

../../../_images/05_GRAPHIC_PILLARS_governance.png

At data-centric companies, the amount of data available continues to grow exponentially. This creates opportunities for more analytics initiatives, more models, more people involved, and more projects that must be pushed to production and monitored. Models themselves have become the lifeblood of many industries.

However, scalability is not just a computational challenge. The value from all of these data-driven initiatives is closely tied to the degree of trust throughout the organization in the integrity of the data at each stage of the analytics pipeline. In fact, in many industries, the regulatory environment now demands documentation of processes behind ML models.

For an organization at any point of maturity on its path to Enterprise AI, DSS makes it possible to manage the complexity of this undertaking while building confidence and transparency in these processes.

../../../_images/intro-complexity.png

Data Governance

Enterprises need processes in place in order to ensure their data is of high quality (the complete lineage is traceable), usable (data can readily be found, and permissions are clearly documented), and secure (only those who should have access actually do).

Having all processes visually mapped in the Flow makes it easy to understand the lineage of datasets back to their origin, including if they are shared as an exposed object from another DSS project.

Moreover, unlike many other tools with a visual component, DSS projects are under version control. Every action in a DSS project, including those made in the visual UI, is recorded in a Git repository. This ensures traceability of all actions performed in DSS, the ability to understand the history of each object, and the ability to revert changes when necessary.

../../../_images/intro-version-control.png

The Version Control page shows the full commit history of the project. From here we can investigate the details of any commit and revert changes if necessary. The commit history for a specific object in a project, such as a dataset or a recipe, can be found in the object’s own History tab.

For even finer detail of past actions, DSS admins have access to an audit trail that logs all actions performed by users, with details about the user ID, timestamp, IP address, and authentication method.

In addition to version control, it is easy to monitor the team’s performance. The Activity tab from the project homepage shows a summary of the commit history and contributor activity. Users can also star objects and control notification settings to be alerted of certain project activities.

../../../_images/intro-activity.png

The Activity tab from the project homepage provides visualizations, such as waffle charts and punch cards, that allow users to monitor contributor activity.

When the number of users contributing to a project increases, it is important to be able to control user permissions and monitor all changes to a project.

DSS employs a groups-based permissions model. The basic principle is that admins assign users to groups and designate permissions of a group for each project. This system allows enterprises to control access to data at a very granular level. Various permissions include options such as “Read project content”, “Write project content”, or “Run scenarios”. Admins can also restrict access to connections based on group membership.

../../../_images/intro-project-security.png

In the project-level security settings, users with admin privileges can review which permissions have been assigned to various groups relating to this specific project.

Management APIs

Remotely managing machine learning models is only a small fraction of what can be achieved with DSS via APIs. Management APIs allow users to perform a comprehensive set of administrative and maintenance operations. Examples include the ability to:

  • Manage security settings, such as group permissions

  • Manage connections to various data stores

  • Populate a project with datasets, recipes, and models according to preset configurations

  • Spin up or take down clusters as needed on cloud infrastructures

  • Automate scenarios, metrics, and checks to manage models in production

The management APIs provide enterprises with an entirely different way of operating DSS aside from, or often in conjunction with, the visual interface. As no enterprise will have the same path to AI, having the flexibility to operate DSS both through a visual interface and programmatically through APIs can be an important component of an enterprise-scaling strategy.

../../../_images/intro-mgmt-api.png