Contents Menu Expand Light mode Dark mode Auto light/dark mode
Dataiku
  • Academy
    • Join the Academy
      Benefit from guided learning opportunities →
      • Quick Starts
      • Learning Paths
      • Certifications
      • Release Highlights
      • Academy Discussions
  • Community
      • Explore the Community
        Discover, share, and contribute →
      • Learn About Us
      • Ask a Question
      • What's New?
      • Discuss Dataiku
      • Using Dataiku
      • Setup and Configuration
      • General Discussion
      • Plugins & Extending Dataiku
      • Product Ideas
      • Programs
      • Frontrunner Awards
      • Dataiku Neurons
      • Community Resources
      • Community Feedback
      • User Research
  • Documentation
    • Reference Documentation
      Comprehensive specifications of Dataiku →
      • Release Notes
      • User's Guide
      • Specific Data Processing
      • Automation & Deployment
      • APIs
      • Installation & Administration
      • Other Topics
  • Knowledge
    • Knowledge Base
      Articles and tutorials on Dataiku features →
      • User Guide
      • Admin Guide
      • Dataiku Solutions
      • Dataiku Cloud
  • Developer
    • Developer Guide
      Tutorials and articles for developers and coder users →
      • Getting Started
      • Concepts and Examples
      • Tutorials
      • API Reference
Dataiku Knowledge Base

User Guide

  • Getting Started
    • Quick Starts
      • Quick Start | Dataiku for data preparation
      • Quick Start | Dataiku for machine learning
      • Quick Start | Dataiku for MLOps
      • Quick Start | Dataiku for AI collaboration
      • Quick Start | Excel to Dataiku
        • Concept | From Excel to Dataiku
      • Quick Start | Alteryx to Dataiku
      • Quick Start | Dataiku for manufacturing data preparation and visualization
    • Dataiku User Interface
      • Concept | Dataiku Cloud Launchpad
      • Concept | Dataiku Design homepage
      • Concept | Project
      • Concept | Flow
      • Concept | Searching in Dataiku
      • Concept | Flow views, search, and filter
      • Tutorial | Explore the Flow
      • Tutorial | Flow zones
      • Reference | Navigation bar
      • Reference | Right panel navigation
      • How-to | Duplicate a Dataiku project
      • How-to | Find the Dataiku version
      • How-to | Rearrange Flow zones
      • Tip | Flow navigation shortcuts
      • Tip | Anchoring for Flow management
      • Tip | Hide or show Flow items
      • Tip | Using project folders
  • Data Sourcing
    • Data Connections
      • Concept | Data connections
      • Concept | Architecture model for databases
      • Concept | Connection changes
      • Tutorial | Configure a connection between Dataiku and an SQL database
      • Tutorial | Data transfer with visual recipes
      • Reference | A primer on connecting to data sources
      • How-to | Remap a connection when importing a project to a Dataiku instance
      • How-to | Utilize MS Access
    • Dataiku Datasets
      • Concept | Dataiku datasets
      • Concept | Dataset characteristics
      • Concept | Sampling on datasets
      • Concept | Dataset conditional formatting
      • Concept | Analyze data quality in the Explore tab
      • Tutorial | Getting started with datasets
      • How-to | Rename a dataset
      • How-to | Reorder or hide dataset columns
      • How-to | Export a filtered dataset
      • How-to | Apply a filter to summary statistics in the Analyze window
      • Tip | Good dataset naming schemes
      • FAQ | Why can’t I drag a folder into Dataiku?
      • FAQ | Where can I see how many records are in my entire dataset?
  • Data Preparation
    • Visual Recipes
      • Concept | Recipes in Dataiku
      • Concept | Sync recipe
      • Concept | Group recipe
      • Concept | Join recipe
      • Concept | Distinct recipe
      • Concept | Pivot recipe
      • Concept | Sample/Filter recipe
      • Concept | Sort recipe
      • Concept | Split recipe
      • Concept | Stack recipe
      • Concept | Top N recipe
      • Concept | Window recipe
      • Concept | Fuzzy join recipe
      • Concept | Geo join recipe
      • Concept | Labeling recipe
      • Concept | Common steps in visual recipes: Pre-filter, Post-filter, & Computed columns
      • Concept | Dynamic dataset and recipe repeat
      • Concept | Generate recipes using Generative AI
      • Tutorial | Group recipe
      • Tutorial | Join recipe
      • Tutorial | Distinct recipe
      • Tutorial | Pivot recipe
      • Tutorial | Top N recipe
      • Tutorial | Window recipe
      • Tutorial | Fuzzy join recipe
      • Tutorial | In-database data visualization and preparation
      • Tutorial | Geo join recipe
      • Tutorial | Compute isochrones and routes with the Geo Router plugin
      • Tutorial | Working with shapefiles and US census data
      • Tutorial | Dynamic recipe repeat
      • How-to | Insert or delete a recipe within the Flow
      • How-to | Segment your data using statistical quantiles
    • Prepare Recipe
      • Concept | Prepare recipe
      • Tutorial | Prepare recipe
      • Tutorial | Smart pattern builder for string pattern extraction
      • Tutorial | Visual logic processors for data preparation
      • Tutorial | Geographic processors
      • Tutorial | Enrich web logs in the Prepare recipe
      • Reference | Performing joins in the Prepare recipe
      • Reference | Using custom Python functions in the Prepare recipe
      • Reference | Handling decimal notations
      • How-to | Normalize number formats in a Prepare recipe
      • How-to | Handle accounting-style negative numbers
      • How-to | Copy-paste Prepare recipe steps
      • How-to | Apply Prepare steps to multiple columns
      • How-to | Standardize text fields using fuzzy values clustering
      • How-to | Reshape data from wide to long format
      • How-to | Generate Prepare recipe steps with AI
    • Dataiku Formulas
      • Concept | Dataiku formulas
      • Concept | Dataiku formulas cheat sheet
      • Concept | Safe sums across columns in Dataiku formulas
      • Tutorial | Relative referencing in Dataiku formulas
      • How-to | Remove scientific notation in a column
      • How-to | Pad a number with leading zeros
      • How-to | Fill empty cells of a column with the value of the corresponding row from another column
      • FAQ | In a formula, how can I check if a variable belongs to a set of values?
    • Data Pipelines & Computation Engines
      • Concept | Computation engines
      • Concept | Build modes
      • Concept | Data pipeline optimization
      • Concept | Where computation happens in Dataiku
      • Tutorial | Build modes
      • Tutorial | Recipe engines
      • How-to | Access job information
      • How-to | Enable SQL pipelines in the Flow
    • The Lab
      • Concept | Visual analyses in the Lab
      • Tutorial | Visual analyses in the Lab
    • Managing Dates
      • Concept | Date handling in Dataiku
      • Reference | How Dataiku handles and displays date and time
    • From Excel to Dataiku
      • Tutorial | Relative referencing in Dataiku formulas
      • How-to | Work with editable datasets
      • How-to | Import an Excel workbook
      • Reference | Data cleaning
      • Reference | Using formulas
      • Reference | Working with dates
      • Reference | Removing duplicates
      • Reference | Filtering rows
      • Reference | Sampling rows
      • Reference | Split a dataset
      • Reference | Append datasets
      • Reference | Joining datasets
      • Reference | Aggregate and pivot
      • Reference | Sorting values
      • Reference | Top values
    • From Alteryx to Dataiku
      • Reference | Alteryx to Dataiku concept mapping
  • Data Visualization
    • Charts
      • Concept | Charts
      • Concept | In-database charts
      • Tutorial | Charts
      • Tutorial | Pivot tables
      • Tutorial | Paneled and animated charts
      • Tutorial | Custom aggregation for charts
      • Tutorial | No-code maps
      • FAQ | How do I display non-aggregated metrics in charts?
      • FAQ | How do I sort on a measure not displayed in charts?
    • Dashboards
      • Concept | Dashboards
      • Tutorial | Use dashboards to build reports
      • Tutorial | Dashboard management
      • How-to | Manage sampling on insights
      • Reference | Understand source data for filters
      • Troubleshoot | Can’t display a web content insight in a dashboard
    • Webapps
      • Concept | Webapps
      • How-to | Display an image in a Bokeh webapp
    • Static Insights
      • Concept | Static insights
      • Tutorial | Static insights
    • Visualization Plugins
      • Concept | Data visualization plugins
  • Collaboration
    • Collaboration Overview
      • Concept | Collaboration
    • Wikis & Flow Documentation
      • Concept | Explain the Flow with generative AI
      • Concept | Workflow documentation in a wiki
      • Reference | Using the project wiki
      • Reference | Sharing and promoting wikis
      • How-to | Create a wiki article
      • How-to | Export a wiki to a PDF
      • How-to | Generate and export Flow documentation
      • Tip | Link Dataiku objects in a wiki article
    • Tags & Object Descriptions
      • Concept | Tags
      • Tip | Suggestions for using tags
      • Tip | Commenting to document Dataiku objects
    • Sharing Projects & Dataiku Assets
      • Concept | Project permissions and asset sharing
      • Concept | Data Catalog
      • Reference | Managing project access
      • How-to | Set up limited access to projects
      • How-to | Manage project access requests
      • How-to | Share project to non-Dataiku users
      • How-to | Manage object sharing
      • How-to | Enable quick sharing of datasets and objects
      • How-to | Copy Flow items to a new or existing project
    • Discussions
      • Concept | Discussions
      • Reference | Managing discussions
      • How-to | Start discussions in a Dataiku object
    • Workspaces
      • Concept | Workspaces
      • Reference | Centralized versus delegated workspaces
      • How-to | Create a workspace
      • How-to | Share a workspace to non-Dataiku users
    • Project Version Control
      • Concept | Version control for Dataiku projects
      • Tutorial | Git for projects
      • How-to | Undo actions in Dataiku
    • Stories
      • Concept | Dataiku stories
      • Tutorial | Dataiku stories with Generative AI
      • Tutorial | Dataiku stories
      • Reference | Story user interface
      • How-to | Enable Story AI
      • How-to | Import a story
  • Data Quality & Automation
    • Variables
      • Concept | Variables in Dataiku
      • Tutorial | Project variables in visual recipes
      • Tutorial | Coding with variables
    • Data Quality
      • Concept | Metrics
      • Concept | Checks
      • Concept | Data quality rules
      • Concept | Metrics & checks (pre-12.6)
      • Concept | Data lineage
      • Tutorial | Data quality
      • Tutorial | Custom metrics, checks, and data quality rules
      • Tutorial | Data quality and SQL metrics
      • FAQ | What’s the difference between distinct and unique value count metrics?
    • Automation Scenarios
      • Concept | Automation scenarios
      • Concept | Custom metrics, checks, data quality rules & scenarios
      • Tutorial | Automation scenarios
      • Tutorial | Scenario reporters
      • Tutorial | Webhook reporters in scenarios
      • Tutorial | Custom step-based scenarios
      • Tutorial | Custom script scenarios
      • How-to | Automate documentation exports in a scenario
      • How-to | Build missing partitions with a scenario
      • Code Sample | Set a timeout for a scenario build step
      • Code Sample | Set email recipients in a “Send email” reporter
      • FAQ | Can I control which datasets in my Flow get rebuilt during a scenario?
    • Dataiku Applications
      • Concept | Dataiku applications
      • Tutorial | Dataiku applications
      • Reference | Use cases of Dataiku applications
    • Partitioning
      • Concept | Partitioning
      • Concept | How partitioning adds value
      • Concept | Partitioned datasets
      • Concept | Jobs with partitioned datasets
      • Concept | Partitioning by pattern
      • Concept | Partitioning in a scenario
      • Concept | Partition redispatch and collection
      • Tutorial | File-based partitioning
      • Tutorial | Column-based partitioning
      • Tutorial | Partitioning in a scenario
      • Tutorial | Repartition a non-partitioned dataset
      • Tip | Interacting with partitioned datasets using the Python API
  • Machine Learning & Analytics
    • Interactive Statistics
      • Concept | Statistics worksheets
      • Concept | Statistics cards
      • Concept | Generate statistics recipe
      • Concept | Variable types for interactive statistics
      • Concept | Factor and response roles in statistics cards
      • Concept | Statistics cards for fit curves and distributions
      • Concept | Correlation matrices in statistical worksheets
      • Concept | Principal Component Analysis (PCA)
      • Concept | Hypothesis testing
      • Concept | Hypothesis test categories
      • Concept | Grouping variables in statistical testing
      • Concept | Adjustment methods for hypothesis test cards
      • Tutorial | Interactive statistics
      • How-to | Export a statistics card as a recipe
    • Machine Learning Concepts
      • Concept | Introduction to machine learning
      • Concept | Predictive modeling
      • Concept | Model validation
      • Concept | Model evaluation
      • Concept | Regression algorithms
      • Concept | Classification algorithms
      • Concept | Clustering algorithms
    • Feature Engineering
      • Concept | Data preparation for machine learning
      • Concept | Generate Features recipe
      • Tutorial | Generate Features recipe
      • Tutorial | Events aggregator plugin
    • AutoML Model Design
      • Concept | Quick models in Dataiku
      • Concept | The Design tab within the visual ML tool
      • Concept | Features handling
      • Concept | Multimodal ML using LLMs
      • Concept | Feature generation & reduction
      • Concept | Algorithm and hyperparameter selection
      • Concept | ML diagnostics
      • Concept | ML assertions
      • Tutorial | Machine learning basics
      • Tutorial | Model overrides
      • Tutorial | ML diagnostics
      • Tutorial | ML assertions
      • Tutorial | Clustering (unsupervised) models with visual ML
      • Tutorial | MLlib with Dataiku
      • How-to | Distributed hyperparameter search
      • FAQ | How does the AutoML tool automatically select or reject features when training a model?
      • Troubleshoot | In visual ML, I get the error “All values of the target are equal” when they’re not
    • AutoML Model Results
      • Concept | The Result tab within the visual ML tool
      • Concept | Model summaries within the visual ML tool
      • Concept | Explainable AI
      • Concept | Partial dependence plots
      • Concept | Subpopulation analysis
      • Concept | Individual prediction explanations
      • Concept | What if? analysis
      • Concept | Advanced What if? simulators
      • Concept | Interpretation of regression model output
      • Tutorial | Advanced What if simulators
      • Tutorial | Exporting a model’s preprocessed data with a Jupyter notebook
      • How-to | Set up What if analysis for a dashboard consumer
      • FAQ | Why don’t the values in the Visual ML chart match the final scores for each algorithm?
    • Model Scoring
      • Concept | Model deployment to the Flow
      • Concept | Scoring data
      • Concept | Model validation and evaluation
      • Tutorial | Model scoring basics
    • Custom Models in Visual ML
      • Concept | Custom preprocessing within the visual ML tool
      • Concept | Custom modeling within the visual ML tool
      • Concept | Tuning XGBoost models in Python
      • Tutorial | Custom preprocessing & modeling within visual ML
      • Tutorial | Azure AutoML from a Dataiku notebook
    • Time Series
      • Concept | Introduction to time series
      • Concept | Time series data types and formats
      • Concept | Time series components
      • Concept | Objectives of time series analysis
      • Concept | Time series analysis with interactive statistics
      • Concept | Time series preparation
      • Concept | Time series resampling
      • Concept | Time series interval extraction
      • Concept | Time series windowing
      • Concept | Time series extrema extraction
      • Concept | Time series forecasting
      • Tutorial | Time series analysis
      • Tutorial | Time series forecasting (Visual ML)
      • Tutorial | Time series preparation
      • Tutorial | Forecasting time series data with R and Dataiku
      • Tutorial | Deep learning for time series
      • Tutorial | Export preprocessed data (for time series models)
    • Causal Prediction
      • Concept | Causal prediction
      • Tutorial | Causal prediction
    • Text Processing
      • Concept | Regular expressions in Dataiku
      • Concept | Introduction to natural language processing (NLP)
      • Concept | Challenges of natural language processing (NLP)
      • Concept | Cleaning text data
      • Concept | Handling text features for machine learning
      • Tutorial | Build a text classification model
    • Images
      • Concept | Pre-trained image classification models
      • Concept | Optimization of image classification models
      • Concept | Object detection
      • Tutorial | Image classification without code
      • Tutorial | Image classification with code
      • Tutorial | Object detection without code
      • How to | Prepare images for use in a model
    • Geospatial Analytics
      • Concept | Geo join recipe
      • Tutorial | Geographic processors
      • Tutorial | No-code maps
      • Tutorial | Geo join recipe
      • Tutorial | Compute isochrones and routes with the Geo Router plugin
      • Tutorial | Working with shapefiles and US census data
      • Reference | Overview of Dataiku’s geospatial features
    • Partitioned Models
      • Concept | Partitioned models
      • Tutorial | Partitioned models
      • How-to | Train a stratified or partitioned model
    • Deep Learning
      • Tutorial | Deep learning within visual ML
      • Tutorial | Deep learning for time series
    • Active Learning
      • Tutorial | Active learning for classification problems
      • Tutorial | Active learning for object detection problems
      • Tutorial | Help on active learning webapp
      • Tutorial | Active learning for object detection problems using Dataiku apps
      • Tutorial | Active learning for tabular data classification problems using Dataiku apps
    • Responsible AI
      • Concept | Responsible AI
      • Concept | Dangers of irresponsible AI
      • Concept | Responsible AI in the data science practice
      • Concept | Basics of bias
      • Concept | Model fairness
      • Concept | Evaluating group fairness
      • Concept | Interpretability
      • Concept | Model transparency
      • Concept | Deployment biases
      • Tutorial | Responsible AI training
      • Reference | RAI further reading
  • Generative AI and Large Language Models (LLMs)
    • LLM Administration
      • Concept | LLM connections
      • Concept | Guardrails against risks from Generative AI and LLMs
    • Text Processing with Visual LLM Recipes
      • Concept | Large language models and the LLM Mesh
      • Concept | Classify text recipe
      • Concept | Summarize text recipe
      • Concept | Prompt Studios and Prompt recipe
      • Tutorial | Classify text with Generative AI
      • Tutorial | Summarize text with Generative AI
      • Tutorial | Prompt engineering with LLMs
      • Tutorial | Processing text with the Prompt recipe
    • Retrieval Augmented Generation (RAG)
      • Concept | Embed recipes and Retrieval Augmented Generation (RAG)
      • Tutorial | Retrieval Augmented Generation (RAG) with the Embed dataset recipe
      • Tutorial | Build a multimodal knowledge bank for a RAG project
      • Tutorial | Build a conversational interface with Dataiku Answers
    • LLMOps
      • Tutorial | LLM evaluation
  • Code
    • Getting Started with Code in Dataiku
      • Concept | Code notebooks
      • Concept | Code recipes
      • Tutorial | Code notebooks and recipes
    • Python and Dataiku
      • Tutorial | Code notebooks and recipes
      • Tutorial | SQL from a Python recipe in Dataiku
      • Tutorial | Sessionization in SQL, Hive, Python, and Pig
      • Tutorial | PySpark in Dataiku
      • Reference | Reading or writing a dataset with custom Python code
      • How-to | Enable auto-completion in a Jupyter notebook
      • Code Sample | Access info about datasets
    • SQL and Dataiku
      • Concept | SQL notebooks
      • Concept | SQL code recipes
      • Concept | AI SQL Assistant
      • Tutorial | SQL notebooks and recipes
    • R and Dataiku
      • Tutorial | Dataiku for R users
      • Tutorial | R Markdown reports
      • Tutorial | Forecasting time series data with R and Dataiku
      • Tutorial | R Shiny webapps
      • Reference | Upgrading and rolling back the R version used in Dataiku
      • How-to | Edit Dataiku recipes in RStudio
      • Troubleshoot | R recipes aren’t working after upgrading or migrating the instance
    • Work Environment
      • Concept | Code environments
      • Concept | External IDE integrations
      • Tutorial | My first Code Studio
      • How-to | Create a code environment
      • How-to | Set a code environment
      • How-to | Edit Dataiku projects and plugins in VS Code
      • How-to | Edit Dataiku projects and plugins in PyCharm
      • How-to | Edit Dataiku projects and plugins in Sublime
      • How-to | Edit Dataiku recipes in RStudio
      • FAQ | Why should I use a code environment?
    • Shared Code
      • Concept | Introduction to shared code
      • Concept | Shared code libraries
      • Concept | Importing code from a remote Git repository
      • Concept | Code samples
      • Tutorial | Shared code
      • Tutorial | Cloning a library from a remote Git repository
      • How-to | Import a notebook from GitHub
      • Tip | Best practices for notebook development between GitHub and Dataiku
    • Dataiku APIs
      • Concept | Dataiku APIs
      • Concept | The dataiku package
      • Concept | Dataiku public API
      • Concept | Usage of Dataiku APIs outside of Dataiku
      • Tutorial | Dataiku public API
      • Tip | Using the API within Dataiku (Basics)
      • Tip | Automating work in Dataiku with the API
      • Tip | Administering Dataiku remotely
    • Managed Folders
      • Concept | Managed folders
      • Tutorial | Managed folders
  • MLOps & Operationalization
    • MLOps Architecture
      • Concept | Definition, challenges, and principles of MLOps
      • Concept | How model development impacts MLOps
      • Concept | Model packaging for deployment
      • Concept | Dataiku architecture for MLOps
    • Batch Deployment
      • Concept | Automation node preparation
      • Concept | Batch deployment
      • Tutorial | Batch deployment
    • Test Scenarios
      • Tutorial | Test scenarios
    • API Deployment
      • Concept | Real-time APIs
      • Concept | API endpoints
      • Concept | API query enrichments
      • Concept | API Deployer
      • Tutorial | Real-time API deployment
    • Model Monitoring
      • Concept | Process governance for MLOps
      • Concept | Model comparisons
      • Concept | Model evaluation stores
      • Concept | Monitoring model performance and drift in production
      • Concept | Monitoring and feedback in the AI project lifecycle
      • Tutorial | Model monitoring with a model evaluation store
      • Tutorial | API endpoint monitoring
      • Tutorial | Model monitoring in different contexts
      • Tutorial | Deployment automation
      • FAQ | How can I get model monitoring metrics in a dataset format?
    • External Models
      • Tutorial | Surface external models within Dataiku
    • Dataiku Govern
      • Concept | Introducing Dataiku Govern
      • Concept | Centralization in Dataiku Govern
      • Concept | Governance layers
      • Concept | Govern item pages
      • Concept | Workflows and project qualification
      • Concept | Governed projects
      • Concept | Business initiatives
      • Concept | Sign-off process
      • Concept | Model maintenance in Dataiku Govern
      • Concept | Govern roles and permissions
      • Concept | Customizing a Dataiku Govern instance
      • Tutorial | Dataiku Govern framework
      • Tutorial | Govern roles and permissions
      • Tutorial | Blueprint Designer
      • Tutorial | Custom Pages Designer
      • Tutorial | Use imported templates in the Blueprint Designer
      • How-to | Export Govern items
      • How-to | Switch artifact templates (blueprint versions)
      • How-to | Subscribe to email notifications
      • How-to | Export and import blueprint and blueprint versions
      • How-to | Add role assignment rules to a Govern item
      • Tip | Embed a dashboard in Dataiku Govern
    • CI/CD Pipelines
      • Tutorial | Getting started with CI/CD pipelines with Dataiku
      • Tutorial | Jenkins pipeline for API services in Dataiku
      • Tutorial | Jenkins pipeline for Dataiku with the Project Deployer
      • Tutorial | Azure pipeline for Dataiku with the Project Deployer
      • Tutorial | Jenkins pipeline for Dataiku without the Project Deployer
    • Feature Store
      • Tutorial | Building your feature store in Dataiku
      • How-to | Add a dataset to the feature store
      • How-to | Add a feature group to the Flow
  • Plugins
    • Plugin Usage
      • Concept | Plugin management
      • Concept | Plugins in Dataiku
      • How-to | Install a plugin
      • How-to | Update a plugin
      • FAQ | Are plugins supported?
      • FAQ | Where can I find the details of a plugin?
    • Plugin Development
      • Concept | Plugin development
      • Concept | Development plugins
      • Concept | Git integration for plugins
      • Reference | Plugin naming policies and conventions
      • Reference | IDE setup to develop Dataiku plugins
      • How-to | Clone a plugin from a remote git repository
      • How-to | Share a plugin as a zip archive
      • How-to | Edit a plugin
      • FAQ | Why should I create plugins?
      • FAQ | What are some examples of plugins?
      • FAQ | Where can I find the code for a plugin?

Dataiku Cloud

  • Space Management
    • Free Trials of Dataiku Cloud
      • How-to | Begin a free trial from Dataiku
      • How-to | Begin a free trial from Snowflake Partner Connect
      • Tip | Working with Snowflake Partner Connect sample projects
    • Users, Profiles & Groups on Dataiku Cloud
      • Reference | Permission management on Dataiku Cloud
      • How-to | Invite users to your Dataiku Cloud space
      • How-to | Automatically attribute profiles and groups to users
      • How-to | Automatically invite users to your instance
      • How-to | Use trial seats
      • How-to | Activate single sign-on (SSO)
      • Troubleshoot | The invited user didn’t receive an email
    • Support on Dataiku Cloud
      • How-to | Contact support on Dataiku Cloud
      • How-to | Grant Dataiku support access to your instance
      • FAQ | Should I email support -at- dataiku -dot- com if I need help?
    • Production Nodes on Dataiku Cloud
      • How-to | Install the Automation node
      • How-to | Install the API node
      • How-to | Use the referenced data deployment mode on Dataiku Cloud
      • How-to | Deploy an API service from the Automation node on Dataiku Cloud
  • Data Transfer and Security on Dataiku Cloud
    • Reference | Relocatable datasets
    • Reference | Data transfer between cloud storage locations
    • How-to | Secure data connections through AWS PrivateLink
    • How-to | Secure data connections through Azure Private Link
    • How-to | Secure data connections through GCP Private Service Connect
    • How-to | Restrict access to Dataiku Cloud IP addresses
    • How-to | Access data sources through a VPN server
  • Compute and Resource Quotas on Dataiku Cloud
    • Reference | Overview of compute engines on Dataiku Cloud
    • Reference | Leveraging fully managed elastic AI compute
    • Reference | Managing elastic AI compute capacity
    • Reference | Managing containerized execution configurations
    • Reference | Resource quota management
    • Tip | Choosing container sizes
    • Tip | Using Spark
    • Troubleshoot | The job takes an unusually long time to complete
    • Troubleshoot | The job queues for a long time and then fails without ever starting

Additional Offerings

  • Dataiku Solutions
    • Retail & CPG
      • Solution | Customer Satisfaction Reviews
      • Solution | Demand Forecast
      • Solution | Distribution Spatial Footprint
      • Solution | Market Basket Analysis
      • Solution | Product Recommendation
      • Solution | Customer Lifetime Value Forecasting
      • Solution | RFM Segmentation
      • Solution | Inventory Allocation Optimization with Grid Dynamics
      • Solution | Markdown Optimization
      • Solution | Store Segmentation
    • Financial Services & Insurance
      • Solution | AML Alerts Triage
      • Solution | Credit Card Fraud
      • Solution | Insurance Claims Modeling
      • Solution | Credit Scoring
      • Solution | Customer Segmentation for Banking
      • Solution | Interactive Document Intelligence for ESG
      • Solution | News Sentiment Stock Alert System
      • Solution | Next Best Offer for Banking
      • Solution | Credit Risk Stress Testing (CECL, IFRS9)
      • Solution | Lead Scoring
    • Health & Life Sciences
      • Solution | Optimizing Omnichannel Marketing
      • Solution | Pharmacovigilance
      • Solution | Social Determinants of Health
      • Solution | Clinical Site Intelligence
      • Solution | Molecular Property Prediction
      • Solution | Drug Repurposing through Graph Analytics
      • Solution | Dynamic HCP Segmentation
      • Solution | Real-World Data: Cohort Discovery
    • Manufacturing & Energy
      • Solution | Maintenance Performance and Planning
      • Solution | Batch Performance Optimization
      • Solution | Delivery Dock Optimization
      • Solution | Factories Electricity & CO2 Emissions Forecasting
      • Solution | Production Quality Control
      • Solution | Parameters Analyzer
    • Finance Teams
      • Solution | Financial Forecasting
    • Operations
      • Solution | Process Mining
      • Solution | Reconciliation
    • Governance
      • Solution | Leveraging Compute Resource Usage Data
      • Solution | EU AI Act Readiness
      • Solution | LLM Provider Due Diligence
      • Solution | ISO 42001 Readiness
    • Real Estate
      • Solution | Real Estate Pricing
  • Use Cases
    • Data Preparation Use Cases
      • Tutorial | Airport traffic by US and international carriers
      • Tutorial | Network optimization
    • Classification Use Cases
      • Tutorial | Predictive maintenance in the manufacturing industry
      • Tutorial | Churn prediction
      • Tutorial | Facies classification
    • Clustering Use Cases
      • Tutorial | Web logs analysis
    • Plugin Use Cases
      • Tutorial | A/B testing for event promotion (AB test calculator plugin)
      • Tutorial | Crawl budget prediction for enhanced SEO (OnCrawl plugin)
      • Tutorial | Data governance with the GDPR plugin

Admin Guide

  • Deploying Dataiku
    • Dataiku Architecture
      • Reference | Fleet Manager
      • Reference | The Dataiku elastic AI stack
    • Deploying Dataiku Instances to Cloud Stacks
      • Tutorial | Deploy a Dataiku instance to Cloud Stacks on AWS
      • Tutorial | Deploy a Dataiku instance to Cloud Stacks on Azure
    • Instance Templates
      • Reference | Fleet blueprints
      • How-to | Create or modify an instance template
      • How-to | Grant SSH access
      • How-to | Grant security roles
      • How-to | Use the license override setting
      • Tip | Modifying instance templates and settings
      • Tip | The impact of instance template modifications on disk sizes
      • Tip | The impact of instance template modifications on other elements
    • Setup Actions for Instance Templates
      • Reference | Setup actions
      • How-to | Add a new setup action
      • How-to | Run Ansible tasks
      • How-to | Set up Kubernetes and Spark-on-Kubernetes
      • How-to | Remove a setup action
    • Virtual Networks
      • Reference | Creating or modifying a virtual network
      • How-to | View or edit a virtual network
      • How-to | Edit virtual network names
      • How-to | Assign a public IP address
      • How-to | Assign a virtual network ID and subnet name
      • How-to | Create default or custom security groups
      • How-to | Enable Fleet Management configuration options
      • How-to | Choose DNS strategy
      • How-to | Choose an SSL strategy
      • How-to | Reprovision an instance after applying modifications
    • Instance Management from Fleet Manager
      • Reference | Instance lifecycle management from Fleet Manager
      • Reference | Defining settings at the instance level
      • Reference | Setting the disk sizes
      • Reference | Reprovisioning, deleting or stopping an instance
      • Reference | Defining static IP addresses
      • Reference | Defining an SSL strategy
      • Reference | Using the dashboard and agent logs
      • How-to | Upgrade an instance
      • How-to | Configure automatic snapshots of the data disk
  • Configuring Dataiku
    • License File Management
      • How-to | Configure your DSS license
      • How-to | Update a license file through the license override setting
      • How-to | Select a sublicense
      • How-to | Update a license file for a cloud setup
      • How-to | Fetch usage statistics in Fleet Manager
      • How-to | View license information from the DSS Administration menu
    • User Identity & Authentication
      • Reference | Security model overview
      • Reference | User identity
      • Reference | User profiles
      • Reference | Supported authentication methods
      • How-to | Create a local user (manually)
      • How-to | Add LDAP users via LDAP configuration
      • How-to | Add local users from an Azure Active Directory (AAD)
    • User Groups & Permissions
      • Reference | Global vs. per-resource group permissions
      • Reference | Global group permissions
      • Reference | Per-resource group permissions
      • How-to | Set up user groups (overview)
      • How-to | Create a group and assign it global permissions
      • How-to | Verify group membership and permissions
      • How-to | Grant per-project permissions
      • How-to | Control access to code environments
      • How-to | Control access to managed clusters
      • How-to | Assign access to containerized execution
      • How-to | Assign Deployer infrastructure permissions
      • Tip | Creating a permissions model based on user types
      • Code Sample | Add a group to a Dataiku project using Python
      • FAQ | Which activities require that a user be added to the allowed_user_groups local Unix group?
    • Connection Usage Parameters
      • Reference | “Allow write” and “Allow managed datasets” usage parameters
      • Reference | Usage parameters for cloud storage
      • Reference | Usage parameters for SQL databases
    • Connection Security
      • Tutorial | Using AWS AssumeRole with an S3 connection to persist datasets
      • Reference | Security permissions for data connections
      • Reference | Global vs. per-user connection credentials
    • DSS Metastore Catalog
      • Reference | Dataiku metastore catalog
      • Reference | Querying datasets from metastore-aware engines
      • How-to | Configure an internal metastore
      • How-to | Configure an external metastore (AWS Glue Data Catalog)
      • How-to | Synchronize a dataset to the metastore catalog
      • How-to | Import a dataset from the Hive metastore (HMS)
      • How-to | Interact with AWS Glue
      • How-to | Build a chart using a metastore-aware engine
      • How-to | Query datasets from a metastore-aware notebook
    • Preferred Connections and Format for Dataset Storage
      • Concept | Default, fallback, and forced dataset connections
      • How-to | Configure the global default file format
      • How-to | Adjust the default configuration for preferred connections and file formats for a project
      • Tip | Selecting default file formats and preferred connections
    • Code Environment Administration
      • How-to | Grant permissions to create or manage code environments
      • How-to | Create a new code environment
      • How-to | Manage code environment properties
      • How-to | Configure default code environments
      • How-to | Install system-level package dependencies
      • How-to | Point DSS to a custom Python package repository
      • How-to | Point DSS to a CRAN mirror
      • How-to | Provide access to custom package repositories via an internet proxy
      • FAQ | Does Dataiku support custom package repositories?
  • Operating Dataiku
    • Instance Monitoring
      • Tutorial | Self-healing API service deployments on Kubernetes
      • Tutorial | Forward Dataiku logs to Splunk Cloud Platform
      • Tutorial | Use Datadog to monitor Dataiku-managed Elastic AI clusters
      • Code Sample | Find out which users are logged onto the Dataiku instance
      • Solution | Leveraging Compute Resource Usage Data
    • Diagnosing Performance Issues
      • How-to | Get support
      • Troubleshoot | A code recipe takes a long time to run
      • Troubleshoot | Dataiku isn’t using the optimal engine for a visual recipe
      • Troubleshoot | A visual recipe job log says “Computation will not be distributed”
      • Troubleshoot | Diagnosing instance-wide performance
      • Troubleshoot | Sync recipe from Snowflake to S3 takes many hours to complete
      • Troubleshoot | Python or PySpark job takes several hours to complete
      • Troubleshoot | The Dataiku UI is slow to load for all users
      • Tip | Scoping performance issues
      • Tip | Takeaways for performance troubleshooting
    • Project Cleaning and Maintenance
      • Tutorial | Create a scenario for automating project maintenance macros
      • Reference | Project maintenance macros
      • Reference | Project maintenance macros glossary
  • Go back to the homepage
Back to top

Getting Started#

If you want a quick hands-on introduction to Dataiku, check out the following task-based quick start guides.

If you want help orienting yourself to the Dataiku interface, see the section on UI resources.

Tip

Another great starting point is to follow our learning paths on Dataiku Academy to upskill through courses and certifications.

Topics#

  • Quick Starts
  • Dataiku User Interface
Next
Quick Starts
Previous
Home
Copyright © 2025, Dataiku
Made with Sphinx and @pradyunsg's Furo
On this page
  • Getting Started
    • Topics