How-to | Protect data sources#
Dataiku Cloud enables Launchpad administrators to protect access to data sources in a variety of ways, including through fixed IP addresses, a VPN server, and AWS PrivateLink (for Amazon S3 or Snowflake).
Restrict access to Dataiku Cloud IP addresses#
Dataiku Cloud always connects to data sources with fixed IP addresses.
To protect access, you can configure an allow list in your data source firewall. Make sure to allow both IP addresses and add them to any database grant.
The IP addresses depend on your instance’s AWS region and are listed in the Launchpad connection forms.
Note
Do not hesitate to contact us if you need assistance.
Access data sources through a VPN server#
You can configure an OpenVPN tunnel between Dataiku and your network to access your private data sources. The OpenVPN server is under your control and it exposes your data sources. Dataiku uses an OpenVPN client to establish the VPN connection and reach them.
Important
VPN is a feature of the Dataiku Cloud Enterprise edition.
Dataiku Cloud only supports OpenVPN servers.
The private subnets exposed by your OpenVPN server should not overlap the following CIDR ranges: 10.0.0.0/16, 10.1.0.0/16, 172.20.0.0/16 or 10.94.0.0/16
To configure the VPN:
Go to Launchpad’s Extensions panel.
Add the VPN extension.
Provide an OpenVPN configuration file for clients.
You can choose between:
Routing all traffic |
If this option is selected, all outgoing traffic from Dataiku will go through the VPN tunnel. In this case, ensure that all your data sources are accessible from your VPN server, and that your VPN server can also route traffic to the internet so your Cloud instance can function properly. |
Routing the traffic to a list of IP ranges |
If you deselected the all traffic option, you must list all addresses or ranges for which the traffic will be routed through the VPN. |
Optionally, a private DNS server can be used. This let you use your own DNS server to resolve the domains of your private data sources that are accessed through the VPN. You have to fill in the IP address of this DNS server, and the list of domains that should be resolved using this DNS server. The other domains will still be resolved by the regular Dataiku DNS servers.
Note
To enable VPN tunneling, the Dataiku instance needs to be restarted. This operation could take up to 15 minutes.
Access your Amazon S3 data through AWS PrivateLink#
AWS PrivateLink provides private connectivity between your Dataiku instance and supported AWS services without exposing your traffic to the public internet. Once activated, Dataiku Cloud will only connect to your S3 buckets using one virtual private cloud (VPC) endpoint.
Important
Support of AWS PrivateLink is a feature of the Dataiku Cloud Enterprise edition.
To configure it:
First, contact our support team so we can provide you with the endpoint to use. You will need to know the AWS region of your S3 buckets.
Add or edit an S3 connection in the Launchpad’s Connections panel, check the box to the Use Path mode, and fill in the Region or Endpoint field with the value provided by support.
Ensure your S3 policy authorizes access to the endpoint.
Note
Athena’s Glue feature will not work with S3 connections using AWS PrivateLink.
An example of an S3 policy configured to only accept requests from a VPC endpoint:
{
"Version": "2012-10-17",
"Id": "Policy1415115909152",
"Statement": [
{ "Sid": "Access-to-specific-VPCE-only",
"Principal": "ARN-OF-IAM-USER-ASSUMED-BY-DATAIKU",
"Action": "s3:*",
"Effect": "Deny",
"Resource": ["S3-BUCKET-ARN",
"S3-BUCKET-ARN/*"],
"Condition": {"StringNotEquals": {"aws:sourceVpce": "VPCE-ID"}}
}
]
}
Access your Snowflake database hosted on AWS through AWS PrivateLink#
AWS PrivateLink provides private connectivity between your Dataiku instance and your Snowflake without exposing your traffic to the public internet. Once activated, Dataiku Cloud will only connect to your Snowflake using a virtual private cloud (VPC) endpoint.
Important
Support of AWS PrivateLink is a feature of the Dataiku Cloud Enterprise edition.
This is only supported if your Snowflake account is hosted on AWS.
Follow the instructions below to configure it.
Note
If you run into any error, please contact our support team.
Ensure your Snowflake region is available in Dataiku Cloud#
In the Dataiku Cloud Launchpad, navigate to the Extensions panel.
Click + Add an Extension.
Select AWS Snowflake endpoint.
Select the AWS region of your Snowflake account. If the region you need is not available, please contact our support to enable it.
Keep this page open, and continue to the next step in the Snowflake console.
Ask Snowflake support to allow AWS PrivateLink from Dataiku’s AWS account#
In the Snowflake console, go to the Support section in the left panel.
Create a new support case by clicking on Support Case in the top right corner.
Fill the title with something meaningful, for example
Enable AWS PrivateLink
.Copy the message from the Dataiku Cloud Launchpad extension page to the support case detail.
Adapt the message with your Snowflake account ID and region with the correct information. Both can be found in the bottom left corner of the Snowflake console.
In the Where did the issue occur? section, select AWS PrivateLink, leave the severity to Sev-4, and click on Create Case.
Wait for Snowflake support to enable PrivateLink before continuing to the next set of instructions.
Retrieve the PrivateLink config from Snowflake#
Having completed the above set of instructions, in Snowflake, create a new SQL worksheet.
Run the following SQL commands with the
ACCOUNTADMIN
role:alter account set ENABLE_INTERNAL_STAGES_PRIVATELINK = true; select SYSTEM$GET_PRIVATELINK_CONFIG();
Click on the output to open a new panel on the right.
Click on the Click to Copy icon to copy the JSON result.
Create the AWS Snowflake endpoint extension in the Dataiku Cloud Launchpad#
Return to the Extensions tab of the Dataiku Cloud Launchpad.
If not still open from the first section, click + Add an Extension, and select AWS Snowflake endpoint.
Provide any string as the endpoint name. It will be helpful if it is a unique identifier.
Select your Snowflake AWS region; it should be available by now.
Check the box to confirm that Snowflake support has enabled PrivateLink for Dataiku’s account.
Paste the JSON you copied from the above set of instructions into the Snowflake PrivateLink config input.
Click Add.
Use the AWS Snowflake endpoint in your Snowflake connections#
You can now use the endpoint you created both in new and existing Snowflake connections. To do that:
In the Dataiku Cloud Launchpad, navigate to a new or existing Snowflake connection.
Select Enable AWS PrivateLink in the Snowflake connection form.
Select the endpoint you created.
Access your Databricks database hosted on AWS through AWS PrivateLink#
AWS PrivateLink provides private connectivity between your Dataiku instance and your Databricks database without exposing your traffic to the public internet. Once activated, Dataiku Cloud will only connect to your Databricks database using a virtual private cloud (VPC) endpoint.
First, see if your Databricks account and workspace meet the requirements to enable PrivateLink, and follow the instructions below to configure it.
Important
Support of AWS PrivateLink is a feature of the Dataiku Cloud Enterprise edition.
This is only supported if your Databricks account is hosted on AWS.
Note
If you run into any error, please contact our support team.
Ensure your Databricks region is available in Dataiku Cloud#
In the Dataiku Cloud Launchpad, navigate to the Extensions panel.
Click + Add an Extension.
Select AWS Databricks endpoint.
Select the Databricks AWS region of your Databricks account. If the region you need is not available, please contact our support team to enable it.
Keep this page open, and continue to the next step in the Databricks console.
Configure your Databricks account for PrivateLink#
The following steps must be performed by a Databricks administrator in your Databricks console.
Register the Dataiku’s VPC endpoint provided in the extension form. You can refer to Databricks’s documentation to do so. Note that the region to fill in is your Databricks workspace AWS region — not Dataiku’s.
Ensure your Private Access Settings (PAS) configuration allows this registered VPC endpoint to connect to your workspace. See Databricks’s article to learn more.
Create the AWS Databricks endpoint extension in the Dataiku Cloud Launchpad#
Return to the Extensions tab of the Dataiku Cloud Launchpad.
If not still open from the first section, click + Add an Extension, and select AWS Databricks endpoint.
Provide any string as the endpoint name. It will be helpful if it is a unique identifier.
Select your Databricks AWS region; it should be available by now.
Check the box to confirm that your Databricks account is configured for PrivateLink.
Fill in the URL of your Databricks workspace.
Click Add.
Use the AWS Databricks endpoint in your Databricks connections#
You can now use the endpoint you created both in new and existing Databricks connections. To do that:
In the Dataiku Cloud Launchpad, navigate to a new or existing Databricks connection.
Select Enable Databricks PrivateLink in the Databricks connection form.
Select the endpoint you created.