Tutorial | Integration with MongoDB#
MongoDB has become very popular over the past years, and may be encountered in many industries and projects. We will show in this article how to read and write data to MongoDB from Dataiku.
You have access to a MongoDB instance with an existing database and collections. Here, we have a local MongoDB instance, with a “dataiku” database and a “github” collection. The collection uses data from the GitHub Archive, which is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis. The raw data is provided as JSON, which makes it perfectly suitable for MongoDB.
Create the MongoDB connection#
If you use a MongoDB data source, the first thing to do in Dataiku is to create the corresponding connection.
In the Administration menu of Dataiku, under the Connections tab, click on New connection and select MongoDB.
Fill in the connection credentials as required.
If needed, you can define the connection using an advanced URI syntax.
Create a MongoDB dataset#
Once your connection created, you can easily access the data stored there by creating a new MongoDB Dataiku dataset (from the Dataset > New menu).
Browse the collections.
Select the one you want to use as dataset.
Note that you can also limit the results returned by using the Input filtering query field:
Analyze your data#
You are now ready to analyze your MongoDB data. For instance, you could use a visual data preparation script from Analyze to flatten or unfold the documents stored in your MongoDB collection:
You may also write the data to MongoDB, for instance if you wished to expose it in a web-based application. Suppose the following dataset has been built:
You could then use a visual preparation script to create a document from each record, using a custom Python formula, for instance:
Finally, you could write this data to MongoDB by deploying the script, and specifying your MongoDB connection as target:
Wait for your script to run. Your data is written into a new MongoDB collection. You may want to check the output directly from a Mongo shell to be sure it worked:
That’s it for a first taste of integrating Dataiku and MongoDB.