Reference | Data transfer between cloud storage locations#

You can categorize data transfer between cloud storage providers into two types of traffic:

Egress traffic

The volume of data leaving your Dataiku instance

Ingress traffic

The volume of data entering your Dataiku instance

The largest source of data transfer between cloud storage providers occurs when you:

  • Leverage Dataiku Cloud-managed compute (DSS engine or containers for Spark jobs or ML model training) on data located on a cloud provider or region different from your Dataiku instance.

  • Sync data from your cloud storage to Dataiku Cloud-managed S3.

For all Dataiku cloud editions, data transfer out of Dataiku Cloud is limited to 1,000 GB per month on Dataiku Cloud. You can visualize the volume of data entering or leaving your Dataiku instance in your Launchpad’s Usage & Monitoring panel.

Important

A motivated customer request can lift this 1,000 GB limit. Contact Support to learn more.

Reducing data transfer volume and costs#

Data transfer between cloud storage providers can be considerable. In general, you can use the following strategies to reduce the data transfer volume and cost:

  • Work on compressed file formats.

  • Leverage push-down compute in-database as much as possible.

  • If you don’t have SQL-type cloud data storage, leverage the managed S3 storage for intermediate datasets in your Flows, and sync only the first input and last output to your external cloud storage to minimize data transfer.