Knowledge
Discussions
Setup & Configuration
Using Dataiku DSS
Plugins & Extending Dataiku DSS
General Discussion
Job Board
Community Resources
Product Ideas
Knowledge
Getting Started
Knowledge Base
Documentation
Academy
Quick Start Programs
Learning Paths
Certifications
Course Catalog
Academy Discussions
Community Programs
Upcoming User Events
Find a User Group
Past Events
Community Conundrums
Dataiku Neurons
Banana Data Podcast
What's New
Getting Started
Dataiku DSS - The Value Proposition
+
Dataiku DSS - Project Walkthrough
The NY Taxi Project through the AI Lifecycle
The AI Lifecycle: Data Acquisition
The AI Lifecycle: Data Exploration
The AI Lifecycle: Data Preparation
The AI Lifecycle: Experiment
The AI Lifecycle: Deploy
The AI Lifecycle: Orchestrate
Business Analyst Quick Start
AI Consumer Quick Start
Data Scientist Quick Start
Data Engineer Quick Start
+
From Excel To Dataiku DSS
Introduction
Data Cleaning
Using Formulas
Working with Dates
Removing Duplicates
Filtering Rows
Sampling Rows
Split a Dataset
Append Datasets
Joining Datasets
Aggregate and Pivot
Sorting Values
Top Values
Setup and Administration
+
Administrator’s Guide
+
Deploy Dataiku
Dataiku Elastic AI Stack: The Full Fleet Architecture
Impact of Modifying Instance Templates and Settings
Deploying a Dataiku Instance to Cloud Stacks on AWS
Modifying Instance Templates and Virtual Networks
Managing Dataiku Instances in Fleet Manager
Deploying a Dataiku Instance to Cloud Stacks on Azure
Modifying Instance Templates and Virtual Networks
+
Configure Dataiku
Build Your Security Model - User Groups
Build your Security Model - Per-Resource Group Permissions
Build Your Security Model - Connections - Usage Parameters
Build Your Security Model - Global vs Per User Credentials
Build Your Security Model - Connections - Metastore
Using AWS AssumeRole with an S3 Connection to Persist Datasets
How-To: Set Up Dataiku Workspaces
Concept: Architecture Model for Databases
Concept: Connections to SQL Databases
Remapping Connections in a Dataiku Instance
Working with MongoDB in Dataiku
Integration with Amazon Redshift
How to Leverage Compute Resource Usage Data
Data Preparation
Concept: Recipes in DSS
+
Preparing Data with Visual Recipes
Concept: Distinct Recipe
Concept: Group Recipe
Concept: Join Recipe
Concept: Pivot Recipe
+
The Pivot Recipe
Reshaping Data from Long to Wide Format
Creating Excel-Style Pivot Tables with the Pivot Recipe
Concept: Prepare Recipe
Concept: Date Handling in DSS
Concept: Formulas in DSS
+
Advanced Prepare Recipe Usage
Handling Decimal Notations
Enriching Web Logs
Applying Prepare Steps to Multiple Columns
Performing Joins in the Prepare Recipe
Become a Master of Dataiku Formulas
Custom Python functions in the Prepare Recipe
How to standardize text fields using fuzzy values clustering
How to fill empty cells of a column with the value of the corresponding row from another column
How to remove scientific notation in a column
How to pad a number with leading zeros
Safe sums across columns in Dataiku DSS Formulas
In a Formula, how to check if a variable belongs to a set of values?
How to copy-paste Prepare recipe steps
Dealing with Accounting-style negative numbers
How-To: Filter and Process Dates Interactively
How-To: Extract Patterns With the Smart Pattern Builder
Hands-On Tutorial: Visual Logic for Data Preparation
How to reorder or hide the columns of a dataset
Concept: Filter Recipe
Hands-On Tutorial: Fuzzy Join Recipe
Concept: Sample Recipe
Concept: Sort Recipe
Concept: Split Recipe
Concept: Stack Recipe
Concept: Top N Recipe
Concept: Window Recipe
Hands-On Tutorial: Window Recipe
Hands-On Tutorial: Window Recipe (Deep Dive)
How to segment your data using statistical quantiles
-
Preparing Data with Code Recipes
Concept: SQL Recipe
Using PySpark in DSS
Using SparkR in DSS
+
Preparing Data with Plugin Recipes
Events Aggregator (Plugin)
+
Building Data Pipelines
Hands-On Tutorial: Data Pipelines
Concept: Computation Engine
Concept: Jobs
Concept: Dataset Building Strategies
Where does it all happen?
How to Enable SQL Pipelines in the Flow
Repartitioning a non-partitioned dataset
Exploring Datasets
+
Connecting to and Exploring Data
Concept: Datasets in DSS
Concept: Partitioning
Concept: Connections
Concept: Schema
Concept: Storage Type
Concept: Meaning
Concept: Sampling
Concept: Analyze
Where can I see how many records are in my entire dataset?
Utilizing MS Access in Dataiku DSS
+
Charts
Concept: Charts
Concept: In-Database Charts
Paneled and Animated Charts
How to display non-aggregated metrics in charts
How to sort on a measure that is not displayed in charts?
Hands-On Tutorial: Visualization Enhancements
Hands-On Tutorial: Charts, Pivot Tables & Dashboard Filter Tiles
+
Exploring Data in the Lab
Concept: The Lab
Concept: SQL Notebooks
Reporting & Insights
+
Dashboards
Concept: Dashboards
Cannot display a web content insight in a dashboard
Hands-On Tutorial: What-If Analysis With Interactive Scoring
+
R Markdown
Concept: R Markdown Reports
R Markdown Reports in Dataiku DSS
+
Webapps in Dataiku
Hands-On: Dash Webapp
Hands-On Tutorial: Bokeh Webapp
Hands-On Tutorial: Shiny Webapp
Hands-On Tutorial: Standard Webapp
Hands-On Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
Hands-On Tutorial: Adapt a D3.js Template in a Webapp
Concept: Webapps in Dataiku
Use Custom Static Files (Javascript, CSS) in a Webapp
Use a React Frontend to Create a Webapp
How-To: Display an Image With Bokeh
Upload to Dataiku DSS in a Webapp
Download from a Dataiku DSS Webapp
+
Static Insights in Dataiku
Concept: Static Insights in Dataiku DSS
Hands-On Tutorial: Static Insights
Concept: Visualization Plugins
Managing Your Work & Collaboration
Concept: Homepage
Concept: Project
Concept: Collaboration
Concept: Flow
Navigating Dataiku DSS with the right panel
Tags
Using Wikis to Share Knowledge
How-To: Export a Wiki to PDF
Using Discussions to Communicate with Teammates
Hands-On Tutorial: Dataiku Workspaces
How to copy a recipe in your Flow
Git for Projects
+
Flow Views & Actions
Flow Views: Zones, Tags, & More
Flow Zones
Hands-On Tutorial: Flow Zones, Tags, & More Flow Views
Concept: Schema Propagation & Consistency Checks
Concept: Connection Changes & Flow Item Reuse
Concept: Dataset Building Strategies
Hands-On Tutorial: Perform Flow Actions
How-To: Feature Store
How-To: Seamless Sharing
Best Practices for Collaborating in Dataiku DSS
Best Practices to Improve Your Productivity
Analytics and Machine Learning
+
Interactive Visual Statistics
Concept: Statistics Worksheet
Concept: Statistics Card
Concept: Categorical and Numerical Variables
Concept: Factor and Response
Concept: Fit Curves and Distributions
Concept: Correlation Matrix
Concept: Principal Component Analysis (PCA)
Concept: Hypothesis Testing
Concept: Test Categories
Concept: Grouping Variable
Concept: Adjustment Method
+
Hands-On: Interactive Visual Statistics
Hands-On: Explore the Interactive Statistics Interface
Hands-On Tutorial: Perform Univariate Analysis
Hands-On: Perform Bivariate Analysis
Hands-On: Fit Univariate Distributions
Hands-On: Fit Bivariate Distributions
Hands-On: Model the Relationship Between Two Variables
Hands-On: Create a Correlation matrix
Hands-On: Analyze Effects of Dimensionality Reduction
Hands-On: Perform One-sample Location Tests
Hands-On: Perform One-sample Distribution Tests
Hands-On: Perform Two-sample Location Tests
Hands-On: Perform Two-sample Distribution Tests
Hands-On: Perform N-sample Location Tests
Hands-On: Perform Tests on Categorical Variables
How-To: Perform Statistical Analysis on Time Series Data
+
Intro to Machine Learning
Concept Summary: Introduction to Machine Learning
Concept Summary: Predictive Modeling
Concept Summary: Model Validation
Concept: Model Evaluation
Concept Summary: Regression Algorithms
Concept Summary: Classification Algorithms
Concept Summary: Clustering Algorithms
+
Visual Machine Learning
+
Machine Learning Basics
Concept: Preparing a Dataset for Machine Learning
Concept: Quick Models
Concept: Design Tab Overview
Hands-On: Create the Model
Concept: Result Tab Overview
Concept: Model Summary Overview
Hands-On: Evaluate the Model
Concept: Feature Handling
Concept: Review the Design
Concept: Algorithms and Hyperparameters
Hands-On: Tune the Model
Concept: Explainable AI
Concept: Partial Dependence
Concept: Subpopulation Analysis
Concept: Individual Explanations
Concept Summary: Interactive Scoring
Hands-On: Explain Your Model
Wrap Up: Machine Learning Basics
+
Scoring Basics
Concept: Deploy the Model
Hands-On: Deploy the Model
Concept: Scoring Data
Hands-On: Scoring Data
Concept: Model Lifecycle Management
Wrap Up: Scoring Basics
Regression Models
Cluster Models
+
Advanced Visual Machine Learning
How To: Use Visual ML Diagnostics
How To: Use Visual ML Assertions
Hands-On Tutorial: Model Fairness Report
How-To: Distributed Hyperparameter Search
How-To: Set up Interactive Scoring for a Dashboard Consumer
Hands-On Tutorial: What-If Analysis With Interactive Scoring
Monitoring model drift with Dataiku
How-To: Model Comparisons and Model Evaluation Stores
How-To: “What-If Accelerators” Counterfactual and Actionable Recourse
Hands-On Tutorial: Visual ML Enhancements
+
Partitioned Models
Concept Summary: Partitioned Models
Hands-On Tutorial: Partitioned Models
How do I train a stratified or partitioned model?
Wrap Up: Partitioned Models
+
Custom Models in Visual ML
Custom Preprocessing in Visual ML
Custom Modeling in Visual ML
Hands-On Tutorial: Custom Preprocessing in the Visual ML Tool
Hands-On Tutorial: Custom Modeling in the Visual ML Tool
Tuning XGBoost Models in Python
Using MLlib with Dataiku
Why don’t the values in the Visual ML chart match the final scores for each algorithm?
In Visual ML, why am I getting the error “All values of the target are equal,” when they are not?
Compute a subpopulation analysis for white-box ML
+
Time Series
+
Time Series Basics
Concept Summary: Introduction to Time Series
Concept Summary: Time Series Data Types and Formats
Concept Summary: Time Series Components
Concept Summary: Objectives of Time Series Analysis
How-To: Perform Statistical Analysis on Time Series Data
+
Time Series Preparation
Concept Summary: Time Series Preparation
Concept Summary: Resampling
Concept Summary: Time Series Interval Extraction Pt 1
Concept Summary: Time Series Interval Extraction Pt 2
Concept Summary: Time Series Interval Extraction Pt 3
Concept Summary: Time Series Windowing Pt 1
Concept Summary: Time Series Windowing Pt 2
Concept Summary: Time Series Windowing Pt 3
Concept Summary: Time Series Extrema Extraction
Hands-On: Visualizing Time Series Data
Hands-On: Resampling Time Series Data
Hands-On: Interval Extraction
Hands-On: Time Series Windowing
Hands-On: Extrema Extraction
+
Time Series Modeling and Forecasting
Hands-On Tutorial: Forecasting Time Series (Visual ML Interface)
Hands-On Tutorial: Forecasting Time Series (Plugin)
Forecasting Time Series Data with R and Dataiku
Deep Learning for Time Series
How Dataiku DSS Handles and Displays Date & Time
Introduction to Deep Learning with Code
+
Natural Language Processing (NLP)
Concept: Introduction to Natural Language Processing
Hands-On Tutorial: Getting Started with NLP
Concept: The Challenges of Natural Language Processing (NLP)
Concept: Cleaning Text Data
Hands-On Tutorial: Cleaning Text Data
Concept: Handling Text Features for ML
Hands-On Tutorial: Handling Text Features for ML
Sentiment Analysis in Dataiku DSS (Plugin)
Recognize author’s style using the Gutenberg plugin
Deep Learning for Sentiment Analysis
How to Use the Python Natural Language Toolkit (NLTK) in Dataiku
How to use spaCy models in Dataiku DSS
+
Image Classification with Visual Tools
Hands-On Tutorial: Image Classification with the Deep Learning on Images Plugin
Hands-On Tutorial: Use the Object Detection in Images Plugin
Image Classification with Code / Deep Learning for Images
+
Geospatial Analytics
Creating Maps in Dataiku without Code
Geographic Processing with Dataiku
Working with Shapefiles and US Census Data in Dataiku
Hands-On Tutorial: Geo Join
+
Active Learning
Active Learning for classification problems
Active Learning for object detection problems
Help on Active Learning Webapp
Active Learning for object detection problems using Dataiku Apps
Active Learning for tabular data classification problems using Dataiku Apps
+
Reinforcement Learning
Introduction to Reinforcement Learning
Q-Learning
Deep Q-Learning
Advanced Code
+
Python and Dataiku DSS
Python in Dataiku DSS
Reading or writing a dataset with custom Python code
How to use SQL from a Python Recipe in Dataiku
Sessionization in SQL, Hive, Python, and Pig
Custom Python Models
Tuning XGBoost Models in Python
How to add a group to a Dataiku DSS Project using a Python Script
How to set a timeout for a particular scenario build step via a custom Python step?
How to use Azure AutoML from a Dataiku DSS Notebook
How to enable auto-completion in Jupyter Notebook
Concept: Managed Folders
Hands-On Tutorial: Managed Folders
+
SQL and Dataiku
+
Integration with SQL Databases
Prerequisites for SQL Integration
Concept: Connections to SQL Databases
Hands-On Tutorial: Configure the Connection Between Dataiku and PostgreSQL
+
Usage of SQL and Dataiku
Concept: Architecture Model for Databases
Hands-On Tutorial: Sync Recipe
Hands-On Tutorial: Prepare Recipe for Loading a Database
Concept: SQL Recipe
Hands-On Tutorial: Create a New Dataset With an SQL Query Recipe
Hands-On Tutorial: Using Visual Recipes to Perform In-database Operations
Concept: In-Database Charts
Hands-On Tutorial: In-Database Charts
Concept: SQL Notebooks
Hands-On Tutorial: SQL Notebooks
+
R and Dataiku
Basics of R in Dataiku DSS
Hands-On Tutorial: Dataiku DSS for R Users (Advanced)
Hands-On Tutorial: Mining Association Rules and Frequent Item Sets with R and Dataiku
Upgrading the R version used in Dataiku DSS
+
Shared Code including Git in Dataiku
Concept: Intro to Shared Code
Concept: Shared Code Libraries
Concept: Importing Code from a Remote Git Repository
Cloning a Library from a Remote Git Repository
Concept: Code Samples
How-To: Import a Notebook from GitHub
Hands-On Tutorial: Shared Code
+
Work Environment
How to Edit Dataiku Recipes and Plugins in Visual Studio Code
How to Edit Dataiku Recipes and Plugins in PyCharm
How to Edit Dataiku Recipes and Plugins in Sublime
How to Edit Dataiku Recipes in RStudio
Using Jupyter Notebooks in DSS
Hands-On Tutorial: My First Code Studio
How to Edit a Code Recipe Using Code Studios
Setting a Code Environment
Memory Optimization Tips: Backend, Python/R, Spark jobs
+
Dataiku APIs
Concept: APIs in Dataiku DSS
Concept: The dataiku Package
Concept: The Public API
Hands-On Tutorial: The Public API in Dataiku
Concept: APIs Outside Dataiku
+
Webapps in Dataiku
Hands-On: Dash Webapp
Hands-On Tutorial: Bokeh Webapp
Hands-On Tutorial: Shiny Webapp
Hands-On Tutorial: Standard Webapp
Hands-On Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
Hands-On Tutorial: Adapt a D3.js Template in a Webapp
Concept: Webapps in Dataiku
Use Custom Static Files (Javascript, CSS) in a Webapp
Use a React Frontend to Create a Webapp
How-To: Display an Image With Bokeh
Upload to Dataiku DSS in a Webapp
Download from a Dataiku DSS Webapp
+
Static Insights in Dataiku
Concept: Static Insights in Dataiku DSS
Hands-On Tutorial: Static Insights
Operationalization
+
Automation
Concept: Metrics & Checks
Concept: Scenarios
Concept: Custom Metrics, Checks & Scenarios
Reporting Scenario Activities
Model Lifecycle
Automation Quick Start
Hands-On: Automation with Metrics, Checks & Scenarios
How to Create a Google Chat Reporter
How to programmatically set email recipients in a “Send email” reporter using the API?
How to create a Jira issue automatically upon a DSS scenario execution failure
Can I control which datasets in my Flow get rebuilt during a scenario?
How to build missing partitions with a scenario
+
MLOps Practitioner Learning Path
+
Production Concepts
MLOps: Definition, Challenges, and Main Principles
Six Components of Model Development that Impact MLOps
How the Dataiku Architecture Supports MLOps
Machine Learning (ML) Model Packages
How to Gain Control of MLOps Processes
Monitoring Model Performance and Drift in Production
Govern
Why Monitoring and Feedback is a Crucial Step in the AI Project Lifecycle
Technical Prerequisites for MLOps Tutorials
+
Preparing for Production
Automation Best Practices
Pipeline Optimization Best Practices
Documenting Your Project Workflow
Hands-On Tutorial: Automation for a Production Environment
+
Projects in Production
Concept: Preparing the Automation Node
Concept: Batch Deployment
Hands-On Tutorial: Batch Deployment
Hands-On Tutorial: Monitoring Projects in Production
Hands-On Tutorial: Automatically Updating Project Deployments
+
Real-Time APIs
Concept: Real-Time API Deployment
Concept: Endpoints and Query Enrichments
Hands-On Tutorial: Create Endpoint and Test Queries
Concept: API Deployer
Hands-On Tutorial: Deploy Real-Time API Service
Hands-On Tutorial: Manage Multiple Versions of an API Service
Monitor Output of API Endpoints
+
Dataiku Applications
An Introduction to Dataiku Applications
Hands-On Tutorial: Create a Visual Dataiku Application
Hands-On Tutorial: Create a Dataiku Application-As-Recipe
Difference Between Webapps and Dataiku Applications
Dataiku Applications: Use Cases
Hands-On Tutorial: Building your Feature Store in Dataiku
+
Building CI/CD pipelines for Dataiku DSS
Building a Jenkins pipeline for API services in Dataiku DSS
Building a Jenkins pipeline for Dataiku DSS with Project Deployer
Building an Azure Pipeline for Dataiku DSS with Project Deployer
Building a Jenkins pipeline for Dataiku DSS without Project Deployer
+
Variables
Variables in Flows, Webapps, and Dataiku Applications
Concept: Variables 101
A Look at Coding with Variables
Concept Summary: Defining Variables
Concept Summary: Using Variables in a Code Recipe
Concept Summary: Modifying the Value of Variables
Hands-On Tutorial: Variables for Coders
Plugin Development & Management
+
Plugin Development
+
Plugin Development (Concepts and Tutorials)
Concept: What Are Development Plugins?
Concept: Developing Plugins
Concept: Git Integration for Plugins
Hands-On Tutorial: Plugin Development
+
Examples of Plugin Component Development
How to Create a Custom Recipe
How to Create a Custom Dataset
How to Create a Partitioned Custom Dataset
How to Create a Custom Webapp
How to Create a Custom Machine Learning Algorithm
Setting Up Your Code Editor to Develop Dataiku Plugins
Plugin Naming Policies and Conventions
What’s Next?
+
Plugin Management
Plugins in Dataiku DSS
Plugin Store Usage
Getting Started with the Dataiku DSS Plugin Store
Sharing a Plugin as a Zip Archive
Hands-On Tutorial: Plugin Store
Managing Plugin Versions with Git
Cloning a Plugin from a Remote Git Repository
Governance
Concept: Catalog and Global Search
Using Global Search in Dataiku DSS
Data Governance with the GDPR Plugin
How to use project folders in Dataiku DSS
Why can’t I drag and drop a folder into Dataiku DSS?
How to duplicate a Dataiku DSS project
How-To: Flow Document Generator
How to find out which users are logged onto the Dataiku DSS instance
Which activities in Dataiku DSS require that a user be added to the
allowed_user_groups
local Unix group?
Use Cases
Airport Traffic by US and International Carriers
Predictive Maintenance in the Manufacturing Industry
Churn Prediction
Web Logs Analysis
Network Optimization
Bike Sharing Usage Patterns
Crawl budget prediction for enhanced SEO with the OnCrawl plugin
A/B Testing for Event Promotion
Facies Classification
Business Solutions
Distribution Spatial Footprint
RFM-Enriched Customer Lifetime Value
Market Basket Analysis
Product Recommendation
RFM Segmentation
Customer Satisfaction Review
Demand Forecast
News Sentiment Stock Alert System
Interactive Document Intelligence for ESG
AML Alerts Triage
Insurance Claims Modeling
Credit Card Fraud
FX P&L Impact Modeling
Process Mining
Real Estate Pricing
Optimizing Omnichannel Marketing in Pharma
Drug Repurposing through Graph Analytics
Factories Electricity & CO2 Emissions Forecasting
Production Quality Control
Dataiku Online
How to begin a Dataiku Online free trial
Starting a Dataiku Online Trial from Snowflake Partner Connect
Manage Dataiku Online from the Launchpad
How to Connect to Your Data on Dataiku Online
How to invite users to your Dataiku Online space
How to Add Plugins to Your Dataiku Online Space
Work With Python on Dataiku Online
How to Create a Python environment
How to Manage your Python environments
How to obtain support on Dataiku Online
Dataiku Academy
You are viewing the Knowledge Base for version
10.0
of DSS.
Docs
»
Data Preparation
»
Preparing Data with Code Recipes
Preparing Data with Code Recipes
¶
Articles
¶
Concept: SQL Recipe
Using PySpark in DSS
Using SparkR in DSS