Data Connections

Explore resources for working with connections to data sources from a user’s perspective.

Tip

To validate your knowledge of this area, register for the Academy courses Integration with SQL Databases and Dataiku & SQL, optional parts of the Core Designer learning path.

Reference | A primer on connecting to data sources

Connect to your existing infrastructure

SQL databases

The list of supported SQL databases, and information on how to connect to them, is available from our documentation on SQL datasets.

Hadoop HDFS

To connect you will first need to configure Hadoop on your instance. Please refer to the detailed guides on specific Hadoop distributions and managed services.

NoSQL

Accessing cloud storage and databases

Cloud File Storage

Cloud Databases

Fetching data from remote sources

It is possible to fetch data using various protocols, and caching the resulting dataset on the filesystem.

File formats

Dataiku can read and write in various file formats for files-based connections: filesystem, HDFS, Amazon S3, HTTP, FTP, SSH… See the list of readable file formats.

Accessing data through plugins

Many applications such as Google Sheets, SalesForce, Slack… provide capabilities to access their data through APIs. Dataiku plugins allow the addition of custom connections leveraging these APIs to easily define datasets that fetch data from a wide variety of applications.

See the available plugins or create your own plugin.

How-to | Utilize MS Access

Many of our users have shown interest in utilizing MS Access in Dataiku. In the interest of knowledge sharing, we wanted to demonstrate how to do just that.

How to open an MS Access file

  • Download ucanaccess

  • Copy ucanaccess-4.0.4.jar and jackcess-2.1.11.jar (including the ones from lib/) into DATA_DIR/lib/jdbc/ (see Installing database drivers for more details)

  • Restart Dataiku.

  • Configure a new connection:

    • JDBC driver class: net.ucanaccess.jdbc.UcanaccessDriver

    • JDBC URL: jdbc:ucanaccess://<filepath> , e.g. jdbc:ucanaccess:///home/johndoe/Documents/fooDB.accdb

    • In the “SQL Dialect” field, select “MySQL < 8”

  • Create a new dataset using this connection to import an MS Access table in Dataiku.

Connecting to MS Access file with Date/Time data type

Be sure to follow the above steps, but utilize UCanAccess 4.0.4

../../_images/ms-access.png