Knowledge
Discussions
Setup & Configuration
Using Dataiku DSS
Plugins & Extending Dataiku DSS
General Discussion
Job Board
Community Resources
Product Ideas
Knowledge
Getting Started
Knowledge Base
Documentation
Academy
Quick Start Programs
Learning Paths
Certifications
Course Catalog
Academy Discussions
Community Programs
Upcoming User Events
Find a User Group
Past Events
Community Conundrums
Dataiku Neurons
Banana Data Podcast
What's New
Getting Started
Dataiku DSS - The Value Proposition
Dataiku DSS - Project Walkthrough
The NY Taxi Project through the AI Lifecycle
The AI Lifecycle: Data Acquisition
The AI Lifecycle: Data Exploration
The AI Lifecycle: Data Preparation
The AI Lifecycle: Experiment
The AI Lifecycle: Deploy
The AI Lifecycle: Orchestrate
Business Analyst Quick Start
AI Consumer Quick Start
Data Scientist Quick Start
Data Engineer Quick Start
From Excel To Dataiku DSS
Introduction
Data Cleaning
Using Formulas
Working with Dates
Removing Duplicates
Filtering Rows
Sampling Rows
Split a Dataset
Append Datasets
Joining Datasets
Aggregate and Pivot
Sorting Values
Top Values
Setup and Administration
Concept Summary: Connections to SQL Databases
Remapping Connections in a DSS Instance
Working with MongoDB in DSS
Integration with Amazon Redshift
How to Leverage Compute Resource Usage Data
Data Preparation
Concept: Recipes in DSS
Preparing Data with Visual Recipes
Concept: Distinct Recipe
Concept: Group Recipe
Concept: Join Recipe
Concept: Pivot Recipe
The Pivot Recipe
Reshaping Data from Long to Wide Format
Creating Excel-Style Pivot Tables with the Pivot Recipe
Concept: Prepare Recipe
Concept: Date Handling in DSS
Concept: Formulas in DSS
Advanced Prepare Recipe Usage
How to reorder or hide the columns of a dataset
Concept: Filter Recipe
Hands On: Fuzzy Join Recipe
Concept: Sample Recipe
Concept: Sort Recipe
Concept: Split Recipe
Concept: Stack Recipe
Concept: Top N Recipe
Concept: Window Recipe
Visual Window Analytic Functions
Concept Summary: Architecture Model for Databases
How to segment your data using statistical quantiles
Preparing Data with Code Recipes
Concept Summary: SQL Recipe
Using PySpark in DSS
Using SparkR in DSS
Preparing Data with Plugin Recipes
Events Aggregator (Plugin)
Building Data Pipelines
Data Pipelines
Concept: Computation Engine
Concept: Jobs
Build Datasets
Where does it all happen?
How to enable SQL pipelines in the Flow
Repartitioning a non-partitioned dataset
Exploring Datasets
Connecting to and Exploring Data
Concept: Datasets in DSS
Concept: Partitioning
Concept: Connections
Concept: Schema
Concept: Storage Type
Concept: Meaning
Concept: Sampling
Concept: Analyze
Where can I see how many records are in my entire dataset?
Utilizing MS Access in Dataiku DSS
Charts
Concept: Charts
Concept Summary: In-Database Charts
Paneled and Animated Charts
How to display non-aggregated metrics in charts
How to sort on a measure that is not displayed in charts?
Exploring Data in the Lab
Concept: The Lab
Concept Summary: SQL Notebooks
Reporting & Insights
Dashboards
Concept: Dashboards
Cannot display a web content insight in a dashboard
Hands-On Tutorial: What-If Analysis With Interactive Scoring
R Markdown
Concept: R Markdown Reports
R Markdown Reports in Dataiku DSS
Webapps in Dataiku DSS
Hands-On: Dash Webapp
Hands-On: Bokeh Webapp
Hands-On: Shiny Webapp
Hands-On: Standard Webapp
Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
Use Custom Static Files (Javascript, CSS) in a Webapp
How to Adapt a D3.js Template in a Webapp
Use a React Frontend to Create a Webapp
How-To: Display an Image With Bokeh
Upload to Dataiku DSS in a Webapp
Download from a Dataiku DSS Webapp
Concept: Visualization Plugins
Managing Your Work & Collaboration
Concept: Homepage
Concept: Project
Concept: Collaboration
Concept: Flow
How to copy a recipe in your Flow
Navigating Dataiku DSS with the right panel
Flow Zones
Tags
Using Wikis to Share Knowledge
How-To: Export a Wiki to PDF
Using Discussions to Communicate with Teammates
Git for Projects
Flow Views & Actions
Flow Views: Zones, Tags, & More
Hands-On Tutorial: Flow Zones, Tags, & More Flow Views
Concept: Schema Propagation & Consistency Checks
Concept: Connection Changes & Flow Item Reuse
Concept: Dataset Building Strategies
Hands-On Tutorial: Perform Flow Actions
Best Practices for Collaborating in Dataiku DSS
Best Practices to Improve Your Productivity
Analytics and Machine Learning
Interactive Visual Statistics
Concept: Statistics Worksheet
Concept: Statistics Card
Concept: Categorical and Numerical Variables
Concept: Factor and Response
Concept: Fit Curves and Distributions
Concept: Correlation Matrix
Concept: Principal Component Analysis (PCA)
Concept: Hypothesis Testing
Concept: Test Categories
Concept: Grouping Variable
Concept: Adjustment Method
Hands-On: Interactive Visual Statistics
Intro to Machine Learning
Concept Summary: Introduction to Machine Learning
Concept Summary: Predictive Modeling
Concept Summary: Model Validation
Concept: Model Evaluation
Concept Summary: Regression Algorithms
Concept Summary: Classification Algorithms
Concept Summary: Clustering Algorithms
K-Means
Hierarchical Clustering
What’s next
Visual Machine Learning
Machine Learning Basics
Interpreting Regression Models’ Outputs
How to identify clusters and name them
Deploy and Score a Model
Concept: Model Lifecycle Management
Concept Summary: Partitioned Models
Hands-On: Partitioned Models
How do I train a stratified or partitioned model?
Custom Models in Visual ML
Using MLLib in the Dataiku DSS interface
Why don’t the values in the Visual ML chart match the final scores for each algorithm?
In Visual ML, why am I getting the error “All values of the target are equal,” when they are not?
Compute a subpopulation analysis for white-box ML
Monitoring model drift with Dataiku DSS
Time Series
Time Series Basics
Time Series Preparation
Time Series Modeling and Forecasting
How Dataiku DSS Handles and Displays Date & Time
Introduction to Deep Learning with Code
Natural Language Processing (NLP)
Concept: Introduction to Natural Language Processing
Hands-On: Getting Started with NLP
Concept: The Challenges of Natural Language Processing (NLP)
Hands-On: Cleaning Text Data
Concept: Handling Text Features for ML
Hands-On: Handling Text Features for ML
Sentiment Analysis in Dataiku DSS (Plugin)
Recognize authors style using the Gutenberg plugin
Natural Language Processing with Code
How to Use the Python Natural Language Toolkit (NLTK) in Dataiku
How to use spaCy models in Dataiku DSS
Image Classification with Visual Tools
Hands-On: Create Your Project and Prepare the Data
Hands-On: Install the Deep Learning Plugins
Concept Summary: Pre-Trained Models
Hands-On: Add a Pre-Trained Model to the Flow
Classify a Set of Test Images with the Pre-Trained Model
Hands-On: Transfer Learning to Retrain the Model
Hands-On: Analyze and Understand Your Model with Tensorboard
Hands-On: Object Detection
Wrap Up
Image Classification with Code
Geospatial Analytics
Creating Maps in Dataiku DSS without Code
Geographic Processing with Dataiku DSS
Working with Shapefiles and US Census Data in DSS
Active Learning
Active Learning for classification problems
Active Learning for object detection problems
Help on Active Learning Webapp
Active Learning for object detection problems using Dataiku Apps
Active Learning for tabular data classification problems using Dataiku Apps
Reinforcement Learning
Introduction to Reinforcement Learning
Q-Learning
Deep Q-Learning
Advanced Code
Python and Dataiku DSS
Python in Dataiku DSS
Reading or writing a dataset with custom Python code
How to use SQL from a Python Recipe in DSS
Sessionization in SQL, Hive, Python, and Pig
Custom Python Models
Tuning XGBoost Models in Python
How to add a group to a Dataiku DSS Project using a Python Script
How to set a timeout for a particular scenario build step via a custom Python step?
How to use Azure AutoML from a Dataiku DSS Notebook
How to enable auto-completion in Jupyter Notebook
Concept: Managed Folders
Hands-On Tutorial: Managed Folders
R and Dataiku DSS
Basics of R in Dataiku DSS
Hands-On Tutorial: Dataiku DSS for R Users (Advanced)
Mining Association Rules and Frequent Item Sets with R and Dataiku DSS
Upgrading the R version used in Dataiku DSS
Work Environment
Using Jupyter Notebooks in DSS
How to Edit Dataiku Recipes and Plugins in Visual Studio Code
How to Edit Dataiku Recipes and Plugins in PyCharm
How to Edit Dataiku Recipes and Plugins in Sublime
How to Edit Dataiku Recipes in RStudio
Setting a Code Environment
Cloning a Library from a Remote Git Repository
How-To: Import a Notebook from GitHub
Dataiku DSS Memory Optimization tips: Backend, Python/R, Spark jobs
Dataiku DSS APIs
Concept: APIs in Dataiku DSS
Concept: The dataiku Package
Concept: The Public API
Hands-On Tutorial: The Public API in Dataiku DSS
Concept: APIs outside Dataiku DSS
Operationalization
Automation
Concept: Metrics & Checks
Concept: Scenarios
Concept: Custom Metrics, Checks & Scenarios
Reporting Scenario Activities
Model Lifecycle
Automation Quick Start
Hands-On: Automation with Metrics, Checks & Scenarios
How to Create a Google Chat Reporter
How to programmatically set email recipients in a “Send email” reporter using the API?
How to create a Jira issue automatically upon a DSS scenario execution failure
Can I control which datasets in my Flow get rebuilt during a scenario?
How to build missing partitions with a scenario
Hands-On Tutorial: Deploying a Flow to Production
Hands-On Tutorial: Deploying to Real-Time Scoring
Deploying Multiple Models to the API Node for A/B Testing
Dataiku Applications
An Introduction to Dataiku Applications
Create a Visual Application
Create an Application-As-Recipe
Difference Between Webapps and Dataiku Applications
Dataiku Applications: Use Cases
Building CI/CD pipelines for Dataiku DSS
Building a Jenkins pipeline for API services in Dataiku DSS
Building a Jenkins pipeline for Dataiku DSS with Project Deployer
Building an Azure Pipeline for Dataiku DSS with Project Deployer
Building a Jenkins pipeline for Dataiku DSS without Project Deployer
Variables
Variables in Flows, Webapps, and Dataiku Applications
A Look at Coding with Variables
Concept Summary: Defining Variables
Concept Summary: Using Variables in a Code Recipe
Concept Summary: Modifying the Value of Variables
Hands-On: Variables for Coders
Plugin Development & Management
Plugin Management
Plugins in Dataiku DSS
Plugin Store Usage
Getting Started with the Dataiku DSS Plugin Store
Sharing a Plugin as a Zip Archive
Hands-On Tutorial: Plugin Store
Managing Plugin Versions with Git
Cloning a Plugin from a Remote Git Repository
Examples of Plugin Component Development
How to Create a Custom Recipe
How to Create a Custom Dataset
How to Create a Partitioned Custom Dataset
How to Create a Custom Webapp
How to Create a Custom Machine Learning Algorithm
Setting Up Your Code Editor to Develop Dataiku Plugins
Plugin Naming Policies and Conventions
What’s Next
Governance
Concept: Catalog and Global Search
Using global search in Dataiku DSS
Data Governance with the GDPR Plugin
How to use project folders in Dataiku DSS
Why can’t I drag and drop a folder into Dataiku DSS?
How to duplicate a Dataiku DSS project
How to find out which users are logged onto the Dataiku DSS instance
Which activities in Dataiku DSS require that a user be added to the
allowed_user_groups
local Unix group?
Use Cases
Airport Traffic by US and International Carriers
Predictive Maintenance
Churn Prediction
Web Logs Analysis
Network Optimization
Bike Sharing Usage Patterns
Crawl budget prediction for enhanced SEO with the OnCrawl plugin
A/B Testing for Event Promotion
Industry Solutions
Distribution Spatial Footprint
RFM-Enriched Customer Lifetime Value
Market Basket Analysis
News Sentiment Stock Alert System
Interactive Document Intelligence for ESG
Real Estate Pricing
Optimizing Omnichannel Marketing in Pharma
Drug Repurposing through Graph Analytics
Dataiku Online
How to begin a Dataiku Online free trial
Starting a Dataiku Online Trial from Snowflake Partner Connect
Manage Dataiku Online from the Launchpad
How to Connect to Your Data on Dataiku Online
How to invite users to your Dataiku Online space
How to Add Plugins to Your Dataiku Online Space
Work With Python on Dataiku Online
Dataiku Academy
You are viewing the Knowledge Base for version
9.0
of DSS.
Docs
»
Data Preparation
»
Building Data Pipelines
Building Data Pipelines
¶
Articles
¶
Data Pipelines
Concept: Computation Engine
Concept: Jobs
Build Datasets
Where does it all happen?
How to enable SQL pipelines in the Flow