Data Connections¶
Explore resources for working with connections to data sources from a user’s perspective.
Tip
To validate your knowledge of this area, register for the Academy courses Integration with SQL Databases and Dataiku & SQL, optional parts of the Core Designer learning path.
Tutorials¶
- Tutorial | Configure a connection between Dataiku and PostgreSQL (SQL part 0)
- Tutorial | Data transfer with the Sync recipe (SQL part 1)
- Tutorial | Data transfer with the Prepare recipe (SQL part 2)
- Tutorial | Remap connections in a Dataiku instance
- Tutorial | Integration with Amazon Redshift
- Tutorial | Integration with MongoDB
Reference | A primer on connecting to data sources¶
Connect to your existing infrastructure¶
SQL databases
The list of supported SQL databases, and information on how to connect to them, is available from our documentation on SQL datasets.
Hadoop HDFS
To connect you will first need to configure Hadoop on your instance. Please refer to the detailed guides on specific Hadoop distributions and managed services.
NoSQL
Accessing cloud storage and databases¶
Cloud File Storage
Cloud Databases
Amazon Redshift. Syncing from S3 to Redshift is most efficient, and Dataiku takes this route whenever possible, see this page for details.
Google BigQuery is available through a JDBC driver developed by Simba.
Fetching data from remote sources¶
It is possible to fetch data using various protocols, and caching the resulting dataset on the filesystem.
You can use an FTP server as an uncached database.
You can also fetch data using HTTP, and use secured connections through SFTP or SCP.
File formats¶
Dataiku can read and write in various file formats for files-based connections: filesystem, HDFS, Amazon S3, HTTP, FTP, SSH… See the list of readable file formats.
Accessing data through plugins¶
Many applications such as Google Sheets, SalesForce, Slack… provide capabilities to access their data through APIs. Dataiku plugins allow the addition of custom connections leveraging these APIs to easily define datasets that fetch data from a wide variety of applications.
See the available plugins or create your own plugin.
How-to | Utilize MS Access¶
Many of our users have shown interest in utilizing MS Access in Dataiku. In the interest of knowledge sharing, we wanted to demonstrate how to do just that.
How to open an MS Access file¶
Download ucanaccess
Copy
ucanaccess-4.0.4.jar
andjackcess-2.1.11.jar
(including the ones from lib/) intoDATA_DIR/lib/jdbc/
(see Installing database drivers for more details)Restart Dataiku.
Configure a new connection:
JDBC driver class:
net.ucanaccess.jdbc.UcanaccessDriver
JDBC URL:
jdbc:ucanaccess://<filepath>
, e.g.jdbc:ucanaccess:///home/johndoe/Documents/fooDB.accdb
In the “SQL Dialect” field, select “MySQL < 8”
Create a new dataset using this connection to import an MS Access table in Dataiku.
Connecting to MS Access file with Date/Time data type¶
Be sure to follow the above steps, but utilize UCanAccess 4.0.4
