Contents Menu Expand Light mode Dark mode Auto light/dark mode
Dataiku
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • Certifications
      • Release Highlights
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask a Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup and Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • Release Notes
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
Dataiku Knowledge Base

User Guide

  • Getting Started
    • Quick Starts
      • Quick Start | Dataiku for data preparation
      • Quick Start | Dataiku for machine learning
      • Quick Start | Dataiku for MLOps
      • Quick Start | Dataiku for AI collaboration
      • Quick Start | Excel to Dataiku
        • Concept | From Excel to Dataiku
      • Quick Start | Alteryx to Dataiku
      • Quick Start | Dataiku for manufacturing data preparation and visualization
    • Dataiku User Interface
      • Concept | Dataiku Cloud Launchpad
      • Concept | Dataiku Design homepage
      • Concept | Project
      • Concept | Flow
      • Concept | Searching in Dataiku
      • Concept | Flow views, search, and filter
      • Tutorial | Explore the Flow
      • Tutorial | Flow zones
      • Reference | Navigation bar
      • Reference | Right panel navigation
      • How-to | Duplicate a Dataiku project
      • How-to | Find the Dataiku version
      • How-to | Rearrange Flow zones
      • Tip | Flow navigation shortcuts
      • Tip | Anchoring for Flow management
      • Tip | Hide or show Flow items
      • Tip | Using project folders
  • Data Sourcing
    • Data Connections
      • Concept | Data connections
      • Concept | Architecture model for databases
      • Concept | Connection changes
      • Tutorial | Configure a connection between Dataiku and an SQL database
      • Tutorial | Data transfer with visual recipes
      • Reference | A primer on connecting to data sources
      • How-to | Remap a connection when importing a project to a Dataiku instance
      • How-to | Utilize MS Access
    • Dataiku Datasets
      • Concept | Dataiku datasets
      • Concept | Dataset characteristics
      • Concept | Sampling on datasets
      • Concept | Dataset conditional formatting
      • Concept | Analyze data quality in the Explore tab
      • Tutorial | Getting started with datasets
      • How-to | Rename a dataset
      • How-to | Reorder or hide dataset columns
      • How-to | Export a filtered dataset
      • How-to | Apply a filter to summary statistics in the Analyze window
      • Tip | Good dataset naming schemes
      • FAQ | Why can’t I drag a folder into Dataiku?
      • FAQ | Where can I see how many records are in my entire dataset?
  • Data Preparation
    • Visual Recipes
      • Concept | Recipes in Dataiku
      • Concept | Sync recipe
      • Concept | Group recipe
      • Concept | Join recipe
      • Concept | Distinct recipe
      • Concept | Pivot recipe
      • Concept | Sample/Filter recipe
      • Concept | Sort recipe
      • Concept | Split recipe
      • Concept | Stack recipe
      • Concept | Top N recipe
      • Concept | Window recipe
      • Concept | Fuzzy join recipe
      • Concept | Geo join recipe
      • Concept | Labeling recipe
      • Concept | Common steps in visual recipes: Pre-filter, Post-filter, & Computed columns
      • Concept | Dynamic dataset and recipe repeat
      • Concept | Generate recipes using Generative AI
      • Tutorial | Group recipe
      • Tutorial | Join recipe
      • Tutorial | Distinct recipe
      • Tutorial | Pivot recipe
      • Tutorial | Top N recipe
      • Tutorial | Window recipe
      • Tutorial | Fuzzy join recipe
      • Tutorial | In-database data visualization and preparation
      • Tutorial | Geo join recipe
      • Tutorial | Compute isochrones and routes with the Geo Router plugin
      • Tutorial | Working with shapefiles and US census data
      • Tutorial | Dynamic recipe repeat
      • How-to | Insert or delete a recipe within the Flow
      • How-to | Segment your data using statistical quantiles
    • Prepare Recipe
      • Concept | Prepare recipe
      • Tutorial | Prepare recipe
      • Tutorial | Smart pattern builder for string pattern extraction
      • Tutorial | Visual logic processors for data preparation
      • Tutorial | Geographic processors
      • Tutorial | Enrich web logs in the Prepare recipe
      • Reference | Performing joins in the Prepare recipe
      • Reference | Using custom Python functions in the Prepare recipe
      • Reference | Handling decimal notations
      • How-to | Normalize number formats in a Prepare recipe
      • How-to | Handle accounting-style negative numbers
      • How-to | Copy-paste Prepare recipe steps
      • How-to | Apply Prepare steps to multiple columns
      • How-to | Standardize text fields using fuzzy values clustering
      • How-to | Reshape data from wide to long format
      • How-to | Generate Prepare recipe steps with AI
    • Dataiku Formulas
      • Concept | Dataiku formulas
      • Concept | Dataiku formulas cheat sheet
      • Concept | Safe sums across columns in Dataiku formulas
      • Tutorial | Relative referencing in Dataiku formulas
      • How-to | Remove scientific notation in a column
      • How-to | Pad a number with leading zeros
      • How-to | Fill empty cells of a column with the value of the corresponding row from another column
      • FAQ | In a formula, how can I check if a variable belongs to a set of values?
    • Data Pipelines & Computation Engines
      • Concept | Computation engines
      • Concept | Build modes
      • Concept | Data pipeline optimization
      • Concept | Where computation happens in Dataiku
      • Tutorial | Build modes
      • Tutorial | Recipe engines
      • How-to | Access job information
      • How-to | Enable SQL pipelines in the Flow
    • The Lab
      • Concept | Visual analyses in the Lab
      • Tutorial | Visual analyses in the Lab
    • Managing Dates
      • Concept | Date handling in Dataiku
      • Reference | How Dataiku handles and displays date and time
    • From Excel to Dataiku
      • Tutorial | Relative referencing in Dataiku formulas
      • How-to | Work with editable datasets
      • How-to | Import an Excel workbook
      • Reference | Data cleaning
      • Reference | Using formulas
      • Reference | Working with dates
      • Reference | Removing duplicates
      • Reference | Filtering rows
      • Reference | Sampling rows
      • Reference | Split a dataset
      • Reference | Append datasets
      • Reference | Joining datasets
      • Reference | Aggregate and pivot
      • Reference | Sorting values
      • Reference | Top values
    • From Alteryx to Dataiku
      • Reference | Alteryx to Dataiku concept mapping
  • Data Visualization
    • Charts
      • Concept | Charts
      • Concept | In-database charts
      • Tutorial | Charts
      • Tutorial | Pivot tables
      • Tutorial | Paneled and animated charts
      • Tutorial | Custom aggregation for charts
      • Tutorial | No-code maps
      • FAQ | How do I display non-aggregated metrics in charts?
      • FAQ | How do I sort on a measure not displayed in charts?
    • Dashboards
      • Concept | Dashboards
      • Tutorial | Use dashboards to build reports
      • Tutorial | Dashboard management
      • How-to | Manage sampling on insights
      • Reference | Understand source data for filters
      • Troubleshoot | Can’t display a web content insight in a dashboard
    • Webapps
      • Concept | Webapps
      • How-to | Display an image in a Bokeh webapp
    • Static Insights
      • Concept | Static insights
      • Tutorial | Static insights
    • Visualization Plugins
      • Concept | Data visualization plugins
  • Collaboration
    • Collaboration Overview
      • Concept | Collaboration
    • Wikis & Flow Documentation
      • Concept | Explain the Flow with generative AI
      • Concept | Workflow documentation in a wiki
      • Reference | Using the project wiki
      • Reference | Sharing and promoting wikis
      • How-to | Create a wiki article
      • How-to | Export a wiki to a PDF
      • How-to | Generate and export Flow documentation
      • Tip | Link Dataiku objects in a wiki article
    • Tags & Object Descriptions
      • Concept | Tags
      • Tip | Suggestions for using tags
      • Tip | Commenting to document Dataiku objects
    • Sharing Projects & Dataiku Assets
      • Concept | Project permissions and asset sharing
      • Concept | Data Catalog
      • Reference | Managing project access
      • How-to | Set up limited access to projects
      • How-to | Manage project access requests
      • How-to | Share project to non-Dataiku users
      • How-to | Manage object sharing
      • How-to | Enable quick sharing of datasets and objects
      • How-to | Copy Flow items to a new or existing project
    • Discussions
      • Concept | Discussions
      • Reference | Managing discussions
      • How-to | Start discussions in a Dataiku object
    • Workspaces
      • Concept | Workspaces
      • Reference | Centralized versus delegated workspaces
      • How-to | Create a workspace
      • How-to | Share a workspace to non-Dataiku users
    • Project Version Control
      • Concept | Version control for Dataiku projects
      • Tutorial | Git for projects
      • How-to | Undo actions in Dataiku
    • Stories
      • Concept | Dataiku stories
      • Tutorial | Dataiku stories with Generative AI
      • Tutorial | Dataiku stories
      • Reference | Story user interface
      • How-to | Enable Story AI
      • How-to | Import a story
  • Data Quality & Automation
    • Variables
      • Concept | Variables in Dataiku
      • Tutorial | Project variables in visual recipes
      • Tutorial | Coding with variables
    • Data Quality
      • Concept | Metrics
      • Concept | Checks
      • Concept | Data quality rules
      • Concept | Metrics & checks (pre-12.6)
      • Concept | Data lineage
      • Tutorial | Data quality
      • Tutorial | Custom metrics, checks, and data quality rules
      • Tutorial | Data quality and SQL metrics
      • FAQ | What’s the difference between distinct and unique value count metrics?
    • Automation Scenarios
      • Concept | Automation scenarios
      • Concept | Custom metrics, checks, data quality rules & scenarios
      • Tutorial | Automation scenarios
      • Tutorial | Scenario reporters
      • Tutorial | Webhook reporters in scenarios
      • Tutorial | Custom step-based scenarios
      • Tutorial | Custom script scenarios
      • How-to | Automate documentation exports in a scenario
      • How-to | Build missing partitions with a scenario
      • Code Sample | Set a timeout for a scenario build step
      • Code Sample | Set email recipients in a “Send email” reporter
      • FAQ | Can I control which datasets in my Flow get rebuilt during a scenario?
    • Dataiku Applications
      • Concept | Dataiku applications
      • Tutorial | Dataiku applications
      • Reference | Use cases of Dataiku applications
    • Partitioning
      • Concept | Partitioning
      • Concept | How partitioning adds value
      • Concept | Partitioned datasets
      • Concept | Jobs with partitioned datasets
      • Concept | Partitioning by pattern
      • Concept | Partitioning in a scenario
      • Concept | Partition redispatch and collection
      • Tutorial | File-based partitioning
      • Tutorial | Column-based partitioning
      • Tutorial | Partitioning in a scenario
      • Tutorial | Repartition a non-partitioned dataset
      • Tip | Interacting with partitioned datasets using the Python API
  • Machine Learning & Analytics
    • Interactive Statistics
      • Concept | Statistics worksheets
      • Concept | Statistics cards
      • Concept | Generate statistics recipe
      • Concept | Variable types for interactive statistics
      • Concept | Factor and response roles in statistics cards
      • Concept | Statistics cards for fit curves and distributions
      • Concept | Correlation matrices in statistical worksheets
      • Concept | Principal Component Analysis (PCA)
      • Concept | Hypothesis testing
      • Concept | Hypothesis test categories
      • Concept | Grouping variables in statistical testing
      • Concept | Adjustment methods for hypothesis test cards
      • Tutorial | Interactive statistics
      • How-to | Export a statistics card as a recipe
    • Machine Learning Concepts
      • Concept | Introduction to machine learning
      • Concept | Predictive modeling
      • Concept | Model validation
      • Concept | Model evaluation
      • Concept | Regression algorithms
      • Concept | Classification algorithms
      • Concept | Clustering algorithms
    • Feature Engineering
      • Concept | Data preparation for machine learning
      • Concept | Generate Features recipe
      • Tutorial | Generate Features recipe
      • Tutorial | Events aggregator plugin
    • AutoML Model Design
      • Concept | Quick models in Dataiku
      • Concept | The Design tab within the visual ML tool
      • Concept | Features handling
      • Concept | Multimodal ML using LLMs
      • Concept | Feature generation & reduction
      • Concept | Algorithm and hyperparameter selection
      • Concept | ML diagnostics
      • Concept | ML assertions
      • Tutorial | Machine learning basics
      • Tutorial | Model overrides
      • Tutorial | ML diagnostics
      • Tutorial | ML assertions
      • Tutorial | Clustering (unsupervised) models with visual ML
      • Tutorial | MLlib with Dataiku
      • How-to | Distributed hyperparameter search
      • FAQ | How does the AutoML tool automatically select or reject features when training a model?
      • Troubleshoot | In visual ML, I get the error “All values of the target are equal” when they’re not
    • AutoML Model Results
      • Concept | The Result tab within the visual ML tool
      • Concept | Model summaries within the visual ML tool
      • Concept | Explainable AI
      • Concept | Partial dependence plots
      • Concept | Subpopulation analysis
      • Concept | Individual prediction explanations
      • Concept | What if? analysis
      • Concept | Advanced What if? simulators
      • Concept | Interpretation of regression model output
      • Tutorial | Advanced What if simulators
      • Tutorial | Exporting a model’s preprocessed data with a Jupyter notebook
      • How-to | Set up What if analysis for a dashboard consumer
      • FAQ | Why don’t the values in the Visual ML chart match the final scores for each algorithm?
    • Model Scoring
      • Concept | Model deployment to the Flow
      • Concept | Scoring data
      • Concept | Model validation and evaluation
      • Tutorial | Model scoring basics
    • Custom Models in Visual ML
      • Concept | Custom preprocessing within the visual ML tool
      • Concept | Custom modeling within the visual ML tool
      • Concept | Tuning XGBoost models in Python
      • Tutorial | Custom preprocessing & modeling within visual ML
      • Tutorial | Azure AutoML from a Dataiku notebook
    • Time Series
      • Concept | Introduction to time series
      • Concept | Time series data types and formats
      • Concept | Time series components
      • Concept | Objectives of time series analysis
      • Concept | Time series analysis with interactive statistics
      • Concept | Time series preparation
      • Concept | Time series resampling
      • Concept | Time series interval extraction
      • Concept | Time series windowing
      • Concept | Time series extrema extraction
      • Concept | Time series forecasting
      • Tutorial | Time series analysis
      • Tutorial | Time series forecasting (Visual ML)
      • Tutorial | Time series preparation
      • Tutorial | Forecasting time series data with R and Dataiku
      • Tutorial | Deep learning for time series
      • Tutorial | Export preprocessed data (for time series models)
    • Causal Prediction
      • Concept | Causal prediction
      • Tutorial | Causal prediction
    • Text Processing
      • Concept | Regular expressions in Dataiku
      • Concept | Introduction to natural language processing (NLP)
      • Concept | Challenges of natural language processing (NLP)
      • Concept | Cleaning text data
      • Concept | Handling text features for machine learning
      • Tutorial | Build a text classification model
    • Images
      • Concept | Pre-trained image classification models
      • Concept | Optimization of image classification models
      • Concept | Object detection
      • Tutorial | Image classification without code
      • Tutorial | Image classification with code
      • Tutorial | Object detection without code
      • How to | Prepare images for use in a model
    • Geospatial Analytics
      • Concept | Geo join recipe
      • Tutorial | Geographic processors
      • Tutorial | No-code maps
      • Tutorial | Geo join recipe
      • Tutorial | Compute isochrones and routes with the Geo Router plugin
      • Tutorial | Working with shapefiles and US census data
      • Reference | Overview of Dataiku’s geospatial features
    • Partitioned Models
      • Concept | Partitioned models
      • Tutorial | Partitioned models
      • How-to | Train a stratified or partitioned model
    • Deep Learning
      • Tutorial | Deep learning within visual ML
      • Tutorial | Deep learning for time series
    • Active Learning
      • Tutorial | Active learning for classification problems
      • Tutorial | Active learning for object detection problems
      • Tutorial | Help on active learning webapp
      • Tutorial | Active learning for object detection problems using Dataiku apps
      • Tutorial | Active learning for tabular data classification problems using Dataiku apps
    • Responsible AI
      • Concept | Responsible AI
      • Concept | Dangers of irresponsible AI
      • Concept | Responsible AI in the data science practice
      • Concept | Basics of bias
      • Concept | Model fairness
      • Concept | Evaluating group fairness
      • Concept | Interpretability
      • Concept | Model transparency
      • Concept | Deployment biases
      • Tutorial | Responsible AI training
      • Reference | RAI further reading
  • Generative AI and Large Language Models (LLMs)
    • LLM Administration
      • Concept | LLM connections
      • Concept | Guardrails against risks from Generative AI and LLMs
    • Text Processing with Visual LLM Recipes
      • Concept | Large language models and the LLM Mesh
      • Concept | Classify text recipe
      • Concept | Summarize text recipe
      • Concept | Prompt Studios and Prompt recipe
      • Tutorial | Classify text with Generative AI
      • Tutorial | Summarize text with Generative AI
      • Tutorial | Prompt engineering with LLMs
      • Tutorial | Processing text with the Prompt recipe
    • Retrieval Augmented Generation (RAG)
      • Concept | Embed recipes and Retrieval Augmented Generation (RAG)
      • Tutorial | Retrieval Augmented Generation (RAG) with the Embed dataset recipe
      • Tutorial | Build a multimodal knowledge bank for a RAG project
      • Tutorial | Build a conversational interface with Dataiku Answers
    • LLMOps
      • Tutorial | LLM evaluation
  • Code
    • Getting Started with Code in Dataiku
      • Concept | Code notebooks
      • Concept | Code recipes
      • Tutorial | Code notebooks and recipes
    • Python and Dataiku
      • Tutorial | Code notebooks and recipes
      • Tutorial | SQL from a Python recipe in Dataiku
      • Tutorial | Sessionization in SQL, Hive, Python, and Pig
      • Tutorial | PySpark in Dataiku
      • Reference | Reading or writing a dataset with custom Python code
      • How-to | Enable auto-completion in a Jupyter notebook
      • Code Sample | Access info about datasets
    • SQL and Dataiku
      • Concept | SQL notebooks
      • Concept | SQL code recipes
      • Concept | AI SQL Assistant
      • Tutorial | SQL notebooks and recipes
    • R and Dataiku
      • Tutorial | Dataiku for R users
      • Tutorial | R Markdown reports
      • Tutorial | Forecasting time series data with R and Dataiku
      • Tutorial | R Shiny webapps
      • Reference | Upgrading and rolling back the R version used in Dataiku
      • How-to | Edit Dataiku recipes in RStudio
      • Troubleshoot | R recipes aren’t working after upgrading or migrating the instance
    • Work Environment
      • Concept | Code environments
      • Concept | External IDE integrations
      • Tutorial | My first Code Studio
      • How-to | Create a code environment
      • How-to | Set a code environment
      • How-to | Edit Dataiku projects and plugins in VS Code
      • How-to | Edit Dataiku projects and plugins in PyCharm
      • How-to | Edit Dataiku projects and plugins in Sublime
      • How-to | Edit Dataiku recipes in RStudio
      • FAQ | Why should I use a code environment?
    • Shared Code
      • Concept | Introduction to shared code
      • Concept | Shared code libraries
      • Concept | Importing code from a remote Git repository
      • Concept | Code samples
      • Tutorial | Shared code
      • Tutorial | Cloning a library from a remote Git repository
      • How-to | Import a notebook from GitHub
      • Tip | Best practices for notebook development between GitHub and Dataiku
    • Dataiku APIs
      • Concept | Dataiku APIs
      • Concept | The dataiku package
      • Concept | Dataiku public API
      • Concept | Usage of Dataiku APIs outside of Dataiku
      • Tutorial | Dataiku public API
      • Tip | Using the API within Dataiku (Basics)
      • Tip | Automating work in Dataiku with the API
      • Tip | Administering Dataiku remotely
    • Managed Folders
      • Concept | Managed folders
      • Tutorial | Managed folders
  • MLOps & Operationalization
    • MLOps Architecture
      • Concept | Definition, challenges, and principles of MLOps
      • Concept | How model development impacts MLOps
      • Concept | Model packaging for deployment
      • Concept | Dataiku architecture for MLOps
    • Batch Deployment
      • Concept | Automation node preparation
      • Concept | Batch deployment
      • Tutorial | Batch deployment
    • Test Scenarios
      • Tutorial | Test scenarios
    • API Deployment
      • Concept | Real-time APIs
      • Concept | API endpoints
      • Concept | API query enrichments
      • Concept | API Deployer
      • Tutorial | Real-time API deployment
    • Model Monitoring
      • Concept | Process governance for MLOps
      • Concept | Model comparisons
      • Concept | Model evaluation stores
      • Concept | Monitoring model performance and drift in production
      • Concept | Monitoring and feedback in the AI project lifecycle
      • Tutorial | Model monitoring with a model evaluation store
      • Tutorial | API endpoint monitoring
      • Tutorial | Model monitoring in different contexts
      • Tutorial | Deployment automation
      • FAQ | How can I get model monitoring metrics in a dataset format?
    • External Models
      • Tutorial | Surface external models within Dataiku
    • Dataiku Govern
      • Concept | Introducing Dataiku Govern
      • Concept | Centralization in Dataiku Govern
      • Concept | Governance layers
      • Concept | Govern item pages
      • Concept | Workflows and project qualification
      • Concept | Governed projects
      • Concept | Business initiatives
      • Concept | Sign-off process
      • Concept | Model maintenance in Dataiku Govern
      • Concept | Govern roles and permissions
      • Concept | Customizing a Dataiku Govern instance
      • Tutorial | Dataiku Govern framework
      • Tutorial | Govern roles and permissions
      • Tutorial | Blueprint Designer
      • Tutorial | Custom Pages Designer
      • Tutorial | Use imported templates in the Blueprint Designer
      • How-to | Export Govern items
      • How-to | Switch artifact templates (blueprint versions)
      • How-to | Subscribe to email notifications
      • How-to | Export and import blueprint and blueprint versions
      • How-to | Add role assignment rules to a Govern item
      • Tip | Embed a dashboard in Dataiku Govern
    • CI/CD Pipelines
      • Tutorial | Getting started with CI/CD pipelines with Dataiku
      • Tutorial | Jenkins pipeline for API services in Dataiku
      • Tutorial | Jenkins pipeline for Dataiku with the Project Deployer
      • Tutorial | Azure pipeline for Dataiku with the Project Deployer
      • Tutorial | Jenkins pipeline for Dataiku without the Project Deployer
    • Feature Store
      • Tutorial | Building your feature store in Dataiku
      • How-to | Add a dataset to the feature store
      • How-to | Add a feature group to the Flow
  • Plugins
    • Plugin Usage
      • Concept | Plugin management
      • Concept | Plugins in Dataiku
      • How-to | Install a plugin
      • How-to | Update a plugin
      • FAQ | Are plugins supported?
      • FAQ | Where can I find the details of a plugin?
    • Plugin Development
      • Concept | Plugin development
      • Concept | Development plugins
      • Concept | Git integration for plugins
      • Reference | Plugin naming policies and conventions
      • Reference | IDE setup to develop Dataiku plugins
      • How-to | Clone a plugin from a remote git repository
      • How-to | Share a plugin as a zip archive
      • How-to | Edit a plugin
      • FAQ | Why should I create plugins?
      • FAQ | What are some examples of plugins?
      • FAQ | Where can I find the code for a plugin?

Dataiku Cloud

  • Space Management
    • Free Trials of Dataiku Cloud
      • How-to | Begin a free trial from Dataiku
      • How-to | Begin a free trial from Snowflake Partner Connect
      • Tip | Working with Snowflake Partner Connect sample projects
    • Users, Profiles & Groups on Dataiku Cloud
      • Reference | Permission management on Dataiku Cloud
      • How-to | Invite users to your Dataiku Cloud space
      • How-to | Automatically attribute profiles and groups to users
      • How-to | Automatically invite users to your instance
      • How-to | Use trial seats
      • How-to | Activate single sign-on (SSO)
      • Troubleshoot | The invited user didn’t receive an email
    • Support on Dataiku Cloud
      • How-to | Contact support on Dataiku Cloud
      • How-to | Grant Dataiku support access to your instance
      • FAQ | Should I email support -at- dataiku -dot- com if I need help?
    • Production Nodes on Dataiku Cloud
      • How-to | Install the Automation node
      • How-to | Install the API node
      • How-to | Use the referenced data deployment mode on Dataiku Cloud
      • How-to | Deploy an API service from the Automation node on Dataiku Cloud
  • Data Transfer and Security on Dataiku Cloud
    • Reference | Relocatable datasets
    • Reference | Data transfer between cloud storage locations
    • How-to | Secure data connections through AWS PrivateLink
    • How-to | Secure data connections through Azure Private Link
    • How-to | Secure data connections through GCP Private Service Connect
    • How-to | Restrict access to Dataiku Cloud IP addresses
    • How-to | Access data sources through a VPN server
  • Compute and Resource Quotas on Dataiku Cloud
    • Reference | Overview of compute engines on Dataiku Cloud
    • Reference | Leveraging fully managed elastic AI compute
    • Reference | Managing elastic AI compute capacity
    • Reference | Managing containerized execution configurations
    • Reference | Resource quota management
    • Tip | Choosing container sizes
    • Tip | Using Spark
    • Troubleshoot | The job takes an unusually long time to complete
    • Troubleshoot | The job queues for a long time and then fails without ever starting

Additional Offerings

  • Dataiku Solutions
    • Retail & CPG
      • Solution | Customer Satisfaction Reviews
      • Solution | Demand Forecast
      • Solution | Distribution Spatial Footprint
      • Solution | Market Basket Analysis
      • Solution | Product Recommendation
      • Solution | Customer Lifetime Value Forecasting
      • Solution | RFM Segmentation
      • Solution | Inventory Allocation Optimization with Grid Dynamics
      • Solution | Markdown Optimization
      • Solution | Store Segmentation
    • Financial Services & Insurance
      • Solution | AML Alerts Triage
      • Solution | Credit Card Fraud
      • Solution | Insurance Claims Modeling
      • Solution | Credit Scoring
      • Solution | Customer Segmentation for Banking
      • Solution | News Sentiment Stock Alert System
      • Solution | Next Best Offer for Banking
      • Solution | Credit Risk Stress Testing (CECL, IFRS9)
      • Solution | Lead Scoring
    • Health & Life Sciences
      • Solution | Optimizing Omnichannel Marketing
      • Solution | Pharmacovigilance
      • Solution | Social Determinants of Health
      • Solution | Clinical Site Intelligence
      • Solution | Molecular Property Prediction
      • Solution | Drug Repurposing through Graph Analytics
      • Solution | Dynamic HCP Segmentation
      • Solution | Real-World Data: Cohort Discovery
    • Manufacturing & Energy
      • Solution | Maintenance Performance and Planning
      • Solution | Batch Performance Optimization
      • Solution | Delivery Dock Optimization
      • Solution | Factories Electricity & CO2 Emissions Forecasting
      • Solution | Production Quality Control
      • Solution | Parameters Analyzer
    • Finance Teams
      • Solution | Financial Forecasting
    • Operations
      • Solution | Process Mining
      • Solution | Reconciliation
    • Governance
      • Solution | Leveraging Compute Resource Usage Data
      • Solution | EU AI Act Readiness
      • Solution | LLM Provider Due Diligence
      • Solution | ISO 42001 Readiness
    • Real Estate
      • Solution | Real Estate Pricing
  • Use Cases
    • Data Preparation Use Cases
      • Tutorial | Airport traffic by US and international carriers
      • Tutorial | Network optimization
    • Classification Use Cases
      • Tutorial | Predictive maintenance in the manufacturing industry
      • Tutorial | Churn prediction
      • Tutorial | Facies classification
    • Clustering Use Cases
      • Tutorial | Web logs analysis
    • Plugin Use Cases
      • Tutorial | A/B testing for event promotion (AB test calculator plugin)
      • Tutorial | Crawl budget prediction for enhanced SEO (OnCrawl plugin)
      • Tutorial | Data governance with the GDPR plugin

Admin Guide

  • Deploying Dataiku
    • Dataiku Architecture
      • Reference | Fleet Manager
      • Reference | The Dataiku elastic AI stack
    • Deploying Dataiku Instances to Cloud Stacks
      • Tutorial | Deploy a Dataiku instance to Cloud Stacks on AWS
      • Tutorial | Deploy a Dataiku instance to Cloud Stacks on Azure
    • Instance Templates
      • Reference | Fleet blueprints
      • How-to | Create or modify an instance template
      • How-to | Grant SSH access
      • How-to | Grant security roles
      • How-to | Use the license override setting
      • Tip | Modifying instance templates and settings
      • Tip | The impact of instance template modifications on disk sizes
      • Tip | The impact of instance template modifications on other elements
    • Setup Actions for Instance Templates
      • Reference | Setup actions
      • How-to | Add a new setup action
      • How-to | Run Ansible tasks
      • How-to | Set up Kubernetes and Spark-on-Kubernetes
      • How-to | Remove a setup action
    • Virtual Networks
      • Reference | Creating or modifying a virtual network
      • How-to | View or edit a virtual network
      • How-to | Edit virtual network names
      • How-to | Assign a public IP address
      • How-to | Assign a virtual network ID and subnet name
      • How-to | Create default or custom security groups
      • How-to | Enable Fleet Management configuration options
      • How-to | Choose DNS strategy
      • How-to | Choose an SSL strategy
      • How-to | Reprovision an instance after applying modifications
    • Instance Management from Fleet Manager
      • Reference | Instance lifecycle management from Fleet Manager
      • Reference | Defining settings at the instance level
      • Reference | Setting the disk sizes
      • Reference | Reprovisioning, deleting or stopping an instance
      • Reference | Defining static IP addresses
      • Reference | Defining an SSL strategy
      • Reference | Using the dashboard and agent logs
      • How-to | Upgrade an instance
      • How-to | Configure automatic snapshots of the data disk
  • Configuring Dataiku
    • License File Management
      • How-to | Configure your DSS license
      • How-to | Update a license file through the license override setting
      • How-to | Select a sublicense
      • How-to | Update a license file for a cloud setup
      • How-to | Fetch usage statistics in Fleet Manager
      • How-to | View license information from the DSS Administration menu
    • User Identity & Authentication
      • Reference | Security model overview
      • Reference | User identity
      • Reference | User profiles
      • Reference | Supported authentication methods
      • How-to | Create a local user (manually)
      • How-to | Add LDAP users via LDAP configuration
      • How-to | Add local users from an Azure Active Directory (AAD)
    • User Groups & Permissions
      • Reference | Global vs. per-resource group permissions
      • Reference | Global group permissions
      • Reference | Per-resource group permissions
      • How-to | Set up user groups (overview)
      • How-to | Create a group and assign it global permissions
      • How-to | Verify group membership and permissions
      • How-to | Grant per-project permissions
      • How-to | Control access to code environments
      • How-to | Control access to managed clusters
      • How-to | Assign access to containerized execution
      • How-to | Assign Deployer infrastructure permissions
      • Tip | Creating a permissions model based on user types
      • Code Sample | Add a group to a Dataiku project using Python
      • FAQ | Which activities require that a user be added to the allowed_user_groups local Unix group?
    • Connection Usage Parameters
      • Reference | “Allow write” and “Allow managed datasets” usage parameters
      • Reference | Usage parameters for cloud storage
      • Reference | Usage parameters for SQL databases
    • Connection Security
      • Tutorial | Using AWS AssumeRole with an S3 connection to persist datasets
      • Reference | Security permissions for data connections
      • Reference | Global vs. per-user connection credentials
    • DSS Metastore Catalog
      • Reference | Dataiku metastore catalog
      • Reference | Querying datasets from metastore-aware engines
      • How-to | Configure an internal metastore
      • How-to | Configure an external metastore (AWS Glue Data Catalog)
      • How-to | Synchronize a dataset to the metastore catalog
      • How-to | Import a dataset from the Hive metastore (HMS)
      • How-to | Interact with AWS Glue
      • How-to | Build a chart using a metastore-aware engine
      • How-to | Query datasets from a metastore-aware notebook
    • Preferred Connections and Format for Dataset Storage
      • Concept | Default, fallback, and forced dataset connections
      • How-to | Configure the global default file format
      • How-to | Adjust the default configuration for preferred connections and file formats for a project
      • Tip | Selecting default file formats and preferred connections
    • Code Environment Administration
      • How-to | Grant permissions to create or manage code environments
      • How-to | Create a new code environment
      • How-to | Manage code environment properties
      • How-to | Configure default code environments
      • How-to | Install system-level package dependencies
      • How-to | Point DSS to a custom Python package repository
      • How-to | Point DSS to a CRAN mirror
      • How-to | Provide access to custom package repositories via an internet proxy
      • FAQ | Does Dataiku support custom package repositories?
  • Operating Dataiku
    • Instance Monitoring
      • Tutorial | Self-healing API service deployments on Kubernetes
      • Tutorial | Forward Dataiku logs to Splunk Cloud Platform
      • Tutorial | Use Datadog to monitor Dataiku-managed Elastic AI clusters
      • Code Sample | Find out which users are logged onto the Dataiku instance
      • Solution | Leveraging Compute Resource Usage Data
    • Diagnosing Performance Issues
      • How-to | Get support
      • Troubleshoot | A code recipe takes a long time to run
      • Troubleshoot | Dataiku isn’t using the optimal engine for a visual recipe
      • Troubleshoot | A visual recipe job log says “Computation will not be distributed”
      • Troubleshoot | Diagnosing instance-wide performance
      • Troubleshoot | Sync recipe from Snowflake to S3 takes many hours to complete
      • Troubleshoot | Python or PySpark job takes several hours to complete
      • Troubleshoot | The Dataiku UI is slow to load for all users
      • Tip | Scoping performance issues
      • Tip | Takeaways for performance troubleshooting
    • Project Cleaning and Maintenance
      • Tutorial | Create a scenario for automating project maintenance macros
      • Reference | Project maintenance macros
      • Reference | Project maintenance macros glossary
  • Go back to the homepage
  • Data Transfer and Security on Dataiku Cloud
Back to top

How-to | Secure data connections through Azure Private Link#

For certain plans, Dataiku enables Launchpad administrators to protect access to certain data sources through Azure Private Link.

Azure Private Link provides private connectivity between your Dataiku instance and supported Azure services without exposing your traffic to the public internet. Once activated, Dataiku Cloud will only connect to your data using a private endpoint.

Important

Azure Private Link isn’t available in all Dataiku plans. You may need to reach out to your Dataiku Account Manager or Customer Success Manager.

Dataiku only supports one Private Link per storage account.

If you run into any error, please contact our support team.

  • Azure Blob Storage

  • An Azure-hosted Snowflake database

  • An Azure-hosted Databricks database

  • An Azure-hosted arbitrary data source

  • An Azure SQL database

  • An Azure managed database

  • An Azure Synapse Analytics workspace

  • An on-premise data source

Azure Blob Storage#

To configure Azure Private Link for an Azure Blob Storage data source:

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure storage endpoint.

  4. Select the Azure Storage Account region. If the region you need isn’t available, please contact the support team to enable it.

Fill out the Azure Storage endpoint form#

Retrieve the storage account name, the resource group, and the subscription id from your Azure Storage Account page.

Accept the Private Link request on Azure#

Navigate to the Private Link Center in your Azure account and then to the Pending Connections to accept the two connection requests. Private Link functionality will be only enabled after you accept these requests.

Create the Azure Blob Storage connection#

You can now use the endpoint you created both in new and existing Azure Blob Storage connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Connections panel.

  2. Select + Add a Connection.

  3. Select Azure Blob Storage and fill the form.

An Azure-hosted Snowflake database#

To configure Azure Private Link for a Snowflake database hosted on Azure:

Ensure your Snowflake region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure Snowflake endpoint.

  4. Select the Azure region of your Snowflake account. If the region you need isn’t available, please contact the support to enable it.

  5. Keep this page open, and continue to the next step in the Snowflake console.

Retrieve the Private Link config from Snowflake#

  1. Having completed the above set of instructions, in Snowflake, create a new SQL worksheet.

  2. Run the following SQL commands with the ACCOUNTADMIN role:

    alter account set ENABLE_INTERNAL_STAGES_PRIVATELINK = true;
    select SYSTEM$GET_PRIVATELINK_CONFIG();
    
  3. Click on the output to open a new panel on the right.

  4. Click on the Click to Copy icon to copy the JSON result.

    ../_images/snowflake-result.png

Create the Azure Snowflake endpoint extension in the Dataiku Cloud Launchpad#

  1. Return to the Extensions tab of the Dataiku Cloud Launchpad.

  2. If not still open from the first section, click + Add an Extension, and select Azure Snowflake endpoint.

  3. Provide any string as the endpoint name. It will be helpful if it’s a unique identifier.

  4. Select your Snowflake Azure region; it should be available by now.

  5. Paste the JSON you copied from the above set of instructions into the Snowflake Private Link config input.

  6. Click Add.

  7. You can close this page while Dataiku creates the underlying infrastructure. This operation usually takes 10 minutes.

  8. Once the infrastructure is created, refresh the page, click on the endpoint from the list, and then click View Details. It should look like this:

    ../_images/azure-support-message.png

Ask Snowflake support to allow Azure Private Link from Dataiku’s Azure account#

  1. In the Snowflake console, go to the Support section in the left panel.

  2. Create a new support case by clicking on Support Case in the top right corner.

  3. Fill the title with something meaningful, for example Enable Azure Private Link.

  4. In the details section of your Snowflake support case, request the approval of the previous endpoint. The endpoint resource ID can be found in the previous View Details message.

  5. In the Where did the issue occur? section, select Azure Private Link under the Managing Security & Authentication category, leave the severity to Sev-4, and click on Create Case.

  6. Wait for Snowflake support to enable Private Link before continuing to the next set of instructions.

Use the Azure Snowflake endpoint in your Snowflake connections#

You can now use the endpoint you created both in new and existing Snowflake connections. To do that:

  1. In the Dataiku Cloud Launchpad, navigate to a new or existing Snowflake connection.

  2. For the host value, fill the value of the host from the previous View Details message.

Note

Your Snowflake connection may use an Azure Blob fast-write connection. In that case, you have to setup Private Link for it as described in Azure Blob Storage if you also want that traffic to go through Private Link.

An Azure-hosted Databricks database#

To configure Azure Private Link for a Databricks database hosted on Azure:

  1. First, see if your Databricks account and workspace meet Azure’s requirements to enable Private Link.

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure Databricks endpoint.

  4. Select the Location. If the region you need isn’t available, please contact the support team to enable it.

Fill out the Azure Databricks endpoint form#

Retrieve the resource group, the subscription ID, the URL, and the name from your Azure Databricks Service page.

Note

  • Endpoint name is the name you want to give to your extension on Dataiku Cloud.

  • Databricks name is the name of the Databricks Workspace you created on Azure.

  • Databricks domain name is the domain name in the URL, without the https:// schema.

Accept the Private Link request on Azure#

Navigate to the Private Link Center in your Azure account and then to the Pending Connections to accept the connection requests. Private Link functionality will only be enabled after you accept this request.

Create the SQL connection#

You can now use the endpoint you created both in new and existing Databricks connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Connections panel.

  2. Select + Add a Connection.

  3. Select Databricks and fill the required fields. The host should be your Databricks domain name.

An Azure-hosted arbitrary data source#

Administrators can leverage Azure Private Link to expose any service running inside their VNets to Dataiku Cloud.

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure private endpoint.

  4. Select the Azure Location. If the location you need isn’t available, please contact the support team to enable it.

Configure the private link with Dataiku#

  1. Retrieve the resource group, the subscription ID, and the name for your private link service from the Azure Private Link Center page.

  2. Go to the Extensions panel of the Dataiku Cloud Launchpad.

  3. Click + Add an Extension.

  4. Select Azure private endpoint.

  5. Fill out the Azure private endpoint form.

Note

  • Name is the name you want to give to your extension on Dataiku Cloud.

  • Private link service name is the name of the private link service you created on Azure.

  1. Click Add.

  2. Accept the Private Link request on Azure (Azure Private Link Center > Pending Connections).

Create the connection#

You can now use the endpoint you created both in new and existing connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Find the Azure private endpoint that you want to use. Click on the three dots button on the right. Then select View Details.

  3. Once the private endpoint is created in the background, you should be able to find the private endpoint IP at the bottom of the modal.

  4. Navigate to the Connections panel.

  5. Click Add a Connection or select the one you want to edit.

  6. Fill out the form, and use the retrieved private endpoint IP in the private link extension as the host parameter.

An Azure SQL database#

Administrators can leverage Azure Private Link for the following Azure SQL databases: PostgreSQL, MySQL, or SQLServer.

Note

  • Depending on your Azure SQL Database type, Private Links can only be available for servers that have public access networking. For example, Azure Database for PostgreSQL and MySQL don’t support creating Private Links for servers configured with private access (VNet integration). This is an Azure limitation. Please refer to Microsoft’s documentation on Private Link for Azure Database for PostgreSQL or MySQL.

  • When using Azure SQL Database, each database belongs to a logical server. In Dataiku Cloud, every Azure SQL Endpoint is linked to an Azure SQL Database logical server. If your databases are on the same logical server, you can use a single Azure SQL Endpoint for all. For more information, please refer to Microsoft’s article What’s a logical server in Azure SQL Database and Azure Synapse?.

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure SQL endpoint.

  4. Select the SQL location. If the region you need isn’t available, please contact the support team to enable it.

Fill out the Azure SQL endpoint form#

Retrieve the resource group, the subscription ID, the SQL type, and the SQL server name from your Azure database page.

Note

  • Endpoint name is the name you want to give to your extension on Dataiku Cloud.

  • SQL type is the type of the SQL database you created on Azure. Only Azure Database for PostgreSQL flexible servers, Azure Database for PostgreSQL servers, Azure Database for MySQL servers, Azure Database for MySQL flexible servers, and Azure SQL Database are supported.

  • SQL full server name name is the “Server name” found on your database page.

Accept the Private Link request on Azure#

Navigate to the networking tab of your Database to accept the connection requests. Private Link functionality will only be enabled after you accept this request.

Create the SQL connection#

You can now use the endpoint you created both in new and existing PostgreSQL, MySQL, or SQLServer connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Connections panel.

  2. Select + Add a Connection.

  3. Select PostgreSQL, MySQL or SQLServer depending on the connection you want to use and fill the form. The value for the host param is the host you can find on your Azure Database.

An Azure managed database#

Administrators can leverage Azure Private Link for an Azure managed database.

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure managed database endpoint.

  4. Select the Azure Location. If the region you need isn’t available, please contact the support team to enable it.

Fill out the Azure managed database endpoint form#

Retrieve the resource group, the subscription ID, and the host from your Azure managed database page.

Note

  • Endpoint name is the name you want to give to your extension on Dataiku Cloud.

  • Host is the host of your managed database. It ends with .database.windows.net.

Accept the Private Link request on Azure#

  1. Navigate to the Private Link Center in your Azure account.

  2. Go to Pending Connections to accept the connection requests.

Private Link functionality will only be enabled after you accept this request.

Create the SQLServer connection#

You can now use the endpoint you created both in new and existing SQLServer connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Connections panel.

  2. Select + Add a Connection.

  3. Select SQLServer and complete the form. The value for the host parameter is the same host you use for the endpoint.

An Azure Synapse Analytics workspace#

Administrators can leverage Azure Private Link for an Azure Synapse Analytics workspace.

Ensure your Azure region is available in Dataiku Cloud#

  1. In the Dataiku Cloud Launchpad, navigate to the Extensions panel.

  2. Click + Add an Extension.

  3. Select Azure Synapse endpoint.

  4. Select the Azure Location. If the region you need isn’t available, please contact the support team to enable it.

Fill out the Azure Synapse endpoint form#

Retrieve the resource group, the subscription ID, and the workspace name from your Azure Synapse workspace page.

Note

  • Endpoint name is the name you want to give to your extension on Dataiku Cloud.

  • Synapse workspace name is the name of the Synapse workspace you created on Azure.

  • SQL pool type is the type of the SQL pool type that will be used for the Private Link connection; it can be either serverless or dedicated.

Accept the Private Link request on Azure#

  1. Navigate to the Private Link Center in your Azure account.

  2. Go to Pending Connections to accept the connection request.

Private Link functionality will only be enabled after you accept this request.

Create the Synapse connection#

You can now use the endpoint you created both in new and existing Synapse connections:

  1. In the Dataiku Cloud Launchpad, navigate to the Connections panel.

  2. Select + Add a Connection.

  3. Select Synapse and complete the form.

Note

You can find the value for the host from your Azure Synapse workspace page. It depends on the SQL pool type you choose in the extension. You should use:

  • Dedicated SQL endpoint if the SQL pool type is Dedicated SQL Pool;

  • Serverless SQL endpoint if the SQL pool type is Serverless SQL Pool.

An on-premise data source#

You can configure Azure Private Link for on-premise data sources if you have access to an Azure account:

  1. Connect your on-premise data source to your VNet as described in the Azure documentation on connecting an on-premises network to Azure.

  2. Follow the steps from An Azure-hosted arbitrary data source to connect to your data source.

Next
How-to | Secure data connections through GCP Private Service Connect
Previous
How-to | Secure data connections through AWS PrivateLink
Copyright © 2025, Dataiku
Made with Sphinx and @pradyunsg's Furo
On this page
  • How-to | Secure data connections through Azure Private Link
    • Azure Blob Storage
      • Ensure your Azure region is available in Dataiku Cloud
      • Fill out the Azure Storage endpoint form
      • Accept the Private Link request on Azure
      • Create the Azure Blob Storage connection
    • An Azure-hosted Snowflake database
      • Ensure your Snowflake region is available in Dataiku Cloud
      • Retrieve the Private Link config from Snowflake
      • Create the Azure Snowflake endpoint extension in the Dataiku Cloud Launchpad
      • Ask Snowflake support to allow Azure Private Link from Dataiku’s Azure account
      • Use the Azure Snowflake endpoint in your Snowflake connections
    • An Azure-hosted Databricks database
      • Ensure your Azure region is available in Dataiku Cloud
      • Fill out the Azure Databricks endpoint form
      • Accept the Private Link request on Azure
      • Create the SQL connection
    • An Azure-hosted arbitrary data source
      • Ensure your Azure region is available in Dataiku Cloud
      • Configure the private link with Dataiku
      • Create the connection
    • An Azure SQL database
      • Ensure your Azure region is available in Dataiku Cloud
      • Fill out the Azure SQL endpoint form
      • Accept the Private Link request on Azure
      • Create the SQL connection
    • An Azure managed database
      • Ensure your Azure region is available in Dataiku Cloud
      • Fill out the Azure managed database endpoint form
      • Accept the Private Link request on Azure
      • Create the SQLServer connection
    • An Azure Synapse Analytics workspace
      • Ensure your Azure region is available in Dataiku Cloud
      • Fill out the Azure Synapse endpoint form
      • Accept the Private Link request on Azure
      • Create the Synapse connection
    • An on-premise data source