Dataiku Knowledge
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
    • Product Ideas
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Documentation
  • Academy
    • Quick Start Programs
    • Learning Paths
    • Certifications
    • Course Catalog
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • Getting Started
    • Dataiku DSS - The Value Proposition
    • Dataiku DSS - Project Walkthrough
      • The NY Taxi Project through the AI Lifecycle
      • The AI Lifecycle: Data Acquisition
      • The AI Lifecycle: Data Exploration
      • The AI Lifecycle: Data Preparation
      • The AI Lifecycle: Experiment
      • The AI Lifecycle: Deploy
      • The AI Lifecycle: Orchestrate
    • Business Analyst Quick Start
    • AI Consumer Quick Start
    • Data Scientist Quick Start
    • Data Engineer Quick Start
    • From Excel To Dataiku DSS
      • Introduction
      • Data Cleaning
      • Using Formulas
      • Working with Dates
      • Removing Duplicates
      • Filtering Rows
      • Sampling Rows
      • Split a Dataset
      • Append Datasets
      • Joining Datasets
      • Aggregate and Pivot
      • Sorting Values
      • Top Values
  • Setup and Administration
    • Concept Summary: Connections to SQL Databases
    • Remapping Connections in a DSS Instance
    • Working with MongoDB in DSS
    • Integration with Amazon Redshift
    • How to Leverage Compute Resource Usage Data
  • Data Preparation
    • Concept: Recipes in DSS
    • Preparing Data with Visual Recipes
      • Concept: Distinct Recipe
      • Concept: Group Recipe
      • Concept: Join Recipe
      • Concept: Pivot Recipe
      • The Pivot Recipe
      • Reshaping Data from Long to Wide Format
      • Creating Excel-Style Pivot Tables with the Pivot Recipe
      • Concept: Prepare Recipe
      • Concept: Date Handling in DSS
      • Concept: Formulas in DSS
      • Advanced Prepare Recipe Usage
      • How to reorder or hide the columns of a dataset
      • Concept: Filter Recipe
      • Hands On: Fuzzy Join Recipe
      • Concept: Sample Recipe
      • Concept: Sort Recipe
      • Concept: Split Recipe
      • Concept: Stack Recipe
      • Concept: Top N Recipe
      • Concept: Window Recipe
      • Visual Window Analytic Functions
      • Concept Summary: Architecture Model for Databases
      • How to segment your data using statistical quantiles
    • Preparing Data with Code Recipes
      • Concept Summary: SQL Recipe
      • Using PySpark in DSS
      • Using SparkR in DSS
    • Preparing Data with Plugin Recipes
      • Events Aggregator (Plugin)
    • Building Data Pipelines
      • Data Pipelines
      • Concept: Computation Engine
      • Concept: Jobs
      • Build Datasets
      • Where does it all happen?
      • How to enable SQL pipelines in the Flow
    • Repartitioning a non-partitioned dataset
  • Exploring Datasets
    • Connecting to and Exploring Data
      • Concept: Datasets in DSS
      • Concept: Partitioning
      • Concept: Connections
      • Concept: Schema
      • Concept: Storage Type
      • Concept: Meaning
      • Concept: Sampling
      • Concept: Analyze
      • Where can I see how many records are in my entire dataset?
      • Utilizing MS Access in Dataiku DSS
    • Charts
      • Concept: Charts
      • Concept Summary: In-Database Charts
      • Paneled and Animated Charts
      • How to display non-aggregated metrics in charts
      • How to sort on a measure that is not displayed in charts?
    • Exploring Data in the Lab
      • Concept: The Lab
      • Concept Summary: SQL Notebooks
  • Reporting & Insights
    • Dashboards
      • Concept: Dashboards
      • Cannot display a web content insight in a dashboard
      • Hands-On Tutorial: What-If Analysis With Interactive Scoring
    • R Markdown
      • Concept: R Markdown Reports
      • R Markdown Reports in Dataiku DSS
    • Webapps in Dataiku DSS
      • Hands-On: Dash Webapp
      • Hands-On: Bokeh Webapp
      • Hands-On: Shiny Webapp
      • Hands-On: Standard Webapp
      • Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
      • Use Custom Static Files (Javascript, CSS) in a Webapp
      • How to Adapt a D3.js Template in a Webapp
      • Use a React Frontend to Create a Webapp
      • How-To: Display an Image With Bokeh
      • Upload to Dataiku DSS in a Webapp
      • Download from a Dataiku DSS Webapp
    • Concept: Visualization Plugins
  • Managing Your Work & Collaboration
    • Concept: Homepage
    • Concept: Project
    • Concept: Collaboration
    • Concept: Flow
    • How to copy a recipe in your Flow
    • Navigating Dataiku DSS with the right panel
    • Flow Zones
    • Tags
    • Using Wikis to Share Knowledge
    • How-To: Export a Wiki to PDF
    • Using Discussions to Communicate with Teammates
    • Git for Projects
    • Flow Views & Actions
      • Flow Views: Zones, Tags, & More
      • Hands-On Tutorial: Flow Zones, Tags, & More Flow Views
      • Concept: Schema Propagation & Consistency Checks
      • Concept: Connection Changes & Flow Item Reuse
      • Concept: Dataset Building Strategies
      • Hands-On Tutorial: Perform Flow Actions
    • Best Practices for Collaborating in Dataiku DSS
    • Best Practices to Improve Your Productivity
  • Analytics and Machine Learning
    • Interactive Visual Statistics
      • Concept: Statistics Worksheet
      • Concept: Statistics Card
      • Concept: Categorical and Numerical Variables
      • Concept: Factor and Response
      • Concept: Fit Curves and Distributions
      • Concept: Correlation Matrix
      • Concept: Principal Component Analysis (PCA)
      • Concept: Hypothesis Testing
      • Concept: Test Categories
      • Concept: Grouping Variable
      • Concept: Adjustment Method
      • Hands-On: Interactive Visual Statistics
    • Intro to Machine Learning
      • Concept Summary: Introduction to Machine Learning
      • Concept Summary: Predictive Modeling
      • Concept Summary: Model Validation
      • Concept: Model Evaluation
      • Concept Summary: Regression Algorithms
      • Concept Summary: Classification Algorithms
      • Concept Summary: Clustering Algorithms
      • K-Means
      • Hierarchical Clustering
      • What’s next
    • Visual Machine Learning
      • Machine Learning Basics
      • Interpreting Regression Models’ Outputs
      • How to identify clusters and name them
      • Deploy and Score a Model
      • Concept: Model Lifecycle Management
      • Concept Summary: Partitioned Models
      • Hands-On: Partitioned Models
      • How do I train a stratified or partitioned model?
      • Custom Models in Visual ML
      • Using MLLib in the Dataiku DSS interface
      • Why don’t the values in the Visual ML chart match the final scores for each algorithm?
      • In Visual ML, why am I getting the error “All values of the target are equal,” when they are not?
      • Compute a subpopulation analysis for white-box ML
    • Monitoring model drift with Dataiku DSS
    • Time Series
      • Time Series Basics
      • Time Series Preparation
      • Time Series Modeling and Forecasting
      • How Dataiku DSS Handles and Displays Date & Time
    • Introduction to Deep Learning with Code
    • Natural Language Processing (NLP)
      • Concept: Introduction to Natural Language Processing
      • Hands-On: Getting Started with NLP
      • Concept: The Challenges of Natural Language Processing (NLP)
      • Hands-On: Cleaning Text Data
      • Concept: Handling Text Features for ML
      • Hands-On: Handling Text Features for ML
      • Sentiment Analysis in Dataiku DSS (Plugin)
      • Recognize authors style using the Gutenberg plugin
      • Natural Language Processing with Code
      • How to Use the Python Natural Language Toolkit (NLTK) in Dataiku
      • How to use spaCy models in Dataiku DSS
    • Image Classification with Visual Tools
      • Hands-On: Create Your Project and Prepare the Data
      • Hands-On: Install the Deep Learning Plugins
      • Concept Summary: Pre-Trained Models
      • Hands-On: Add a Pre-Trained Model to the Flow
      • Classify a Set of Test Images with the Pre-Trained Model
      • Hands-On: Transfer Learning to Retrain the Model
      • Hands-On: Analyze and Understand Your Model with Tensorboard
      • Hands-On: Object Detection
      • Wrap Up
    • Image Classification with Code
    • Geospatial Analytics
      • Creating Maps in Dataiku DSS without Code
      • Geographic Processing with Dataiku DSS
      • Working with Shapefiles and US Census Data in DSS
    • Active Learning
      • Active Learning for classification problems
      • Active Learning for object detection problems
      • Help on Active Learning Webapp
      • Active Learning for object detection problems using Dataiku Apps
      • Active Learning for tabular data classification problems using Dataiku Apps
    • Reinforcement Learning
      • Introduction to Reinforcement Learning
      • Q-Learning
      • Deep Q-Learning
  • Advanced Code
    • Python and Dataiku DSS
      • Python in Dataiku DSS
      • Reading or writing a dataset with custom Python code
      • How to use SQL from a Python Recipe in DSS
      • Sessionization in SQL, Hive, Python, and Pig
      • Custom Python Models
      • Tuning XGBoost Models in Python
      • How to add a group to a Dataiku DSS Project using a Python Script
      • How to set a timeout for a particular scenario build step via a custom Python step?
      • How to use Azure AutoML from a Dataiku DSS Notebook
      • How to enable auto-completion in Jupyter Notebook
      • Concept: Managed Folders
      • Hands-On Tutorial: Managed Folders
    • R and Dataiku DSS
      • Basics of R in Dataiku DSS
      • Hands-On Tutorial: Dataiku DSS for R Users (Advanced)
      • Mining Association Rules and Frequent Item Sets with R and Dataiku DSS
      • Upgrading the R version used in Dataiku DSS
    • Work Environment
      • Using Jupyter Notebooks in DSS
      • How to Edit Dataiku Recipes and Plugins in Visual Studio Code
      • How to Edit Dataiku Recipes and Plugins in PyCharm
      • How to Edit Dataiku Recipes and Plugins in Sublime
      • How to Edit Dataiku Recipes in RStudio
      • Setting a Code Environment
      • Cloning a Library from a Remote Git Repository
      • How-To: Import a Notebook from GitHub
      • Dataiku DSS Memory Optimization tips: Backend, Python/R, Spark jobs
    • Dataiku DSS APIs
      • Concept: APIs in Dataiku DSS
      • Concept: The dataiku Package
      • Concept: The Public API
      • Hands-On Tutorial: The Public API in Dataiku DSS
      • Concept: APIs outside Dataiku DSS
  • Operationalization
    • Automation
      • Concept: Metrics & Checks
      • Concept: Scenarios
      • Concept: Custom Metrics, Checks & Scenarios
      • Reporting Scenario Activities
      • Model Lifecycle
      • Automation Quick Start
      • Hands-On: Automation with Metrics, Checks & Scenarios
      • How to Create a Google Chat Reporter
      • How to programmatically set email recipients in a “Send email” reporter using the API?
      • How to create a Jira issue automatically upon a DSS scenario execution failure
      • Can I control which datasets in my Flow get rebuilt during a scenario?
      • How to build missing partitions with a scenario
    • Hands-On Tutorial: Deploying a Flow to Production
    • Hands-On Tutorial: Deploying to Real-Time Scoring
    • Deploying Multiple Models to the API Node for A/B Testing
    • Dataiku Applications
      • An Introduction to Dataiku Applications
      • Create a Visual Application
      • Create an Application-As-Recipe
      • Difference Between Webapps and Dataiku Applications
      • Dataiku Applications: Use Cases
    • Building CI/CD pipelines for Dataiku DSS
      • Building a Jenkins pipeline for API services in Dataiku DSS
      • Building a Jenkins pipeline for Dataiku DSS with Project Deployer
      • Building an Azure Pipeline for Dataiku DSS with Project Deployer
      • Building a Jenkins pipeline for Dataiku DSS without Project Deployer
    • Variables
      • Variables in Flows, Webapps, and Dataiku Applications
      • A Look at Coding with Variables
      • Concept Summary: Defining Variables
      • Concept Summary: Using Variables in a Code Recipe
      • Concept Summary: Modifying the Value of Variables
      • Hands-On: Variables for Coders
  • Plugin Development & Management
    • Plugin Management
      • Plugins in Dataiku DSS
      • Plugin Store Usage
      • Getting Started with the Dataiku DSS Plugin Store
      • Sharing a Plugin as a Zip Archive
      • Hands-On Tutorial: Plugin Store
      • Managing Plugin Versions with Git
      • Cloning a Plugin from a Remote Git Repository
    • Examples of Plugin Component Development
      • How to Create a Custom Recipe
      • How to Create a Custom Dataset
      • How to Create a Partitioned Custom Dataset
      • How to Create a Custom Webapp
      • How to Create a Custom Machine Learning Algorithm
      • Setting Up Your Code Editor to Develop Dataiku Plugins
      • Plugin Naming Policies and Conventions
      • What’s Next
  • Governance
    • Concept: Catalog and Global Search
    • Using global search in Dataiku DSS
    • Data Governance with the GDPR Plugin
    • How to use project folders in Dataiku DSS
    • Why can’t I drag and drop a folder into Dataiku DSS?
    • How to duplicate a Dataiku DSS project
    • How to find out which users are logged onto the Dataiku DSS instance
    • Which activities in Dataiku DSS require that a user be added to the allowed_user_groups local Unix group?
  • Use Cases
    • Airport Traffic by US and International Carriers
    • Predictive Maintenance
    • Churn Prediction
    • Web Logs Analysis
    • Network Optimization
    • Bike Sharing Usage Patterns
    • Crawl budget prediction for enhanced SEO with the OnCrawl plugin
    • A/B Testing for Event Promotion
  • Industry Solutions
    • Distribution Spatial Footprint
    • RFM-Enriched Customer Lifetime Value
    • Market Basket Analysis
    • News Sentiment Stock Alert System
    • Interactive Document Intelligence for ESG
    • Real Estate Pricing
    • Optimizing Omnichannel Marketing in Pharma
    • Drug Repurposing through Graph Analytics
  • Dataiku Online
    • How to begin a Dataiku Online free trial
    • Starting a Dataiku Online Trial from Snowflake Partner Connect
    • Manage Dataiku Online from the Launchpad
    • How to Connect to Your Data on Dataiku Online
    • How to invite users to your Dataiku Online space
    • How to Add Plugins to Your Dataiku Online Space
    • Work With Python on Dataiku Online
 
Dataiku Academy
You are viewing the Knowledge Base for version 9.0 of DSS.
  • Docs »
  • Data Preparation »
  • Building Data Pipelines

Building Data Pipelines¶

Articles¶

  • Data Pipelines
  • Concept: Computation Engine
  • Concept: Jobs
  • Build Datasets
  • Where does it all happen?
  • How to enable SQL pipelines in the Flow
Next Previous

© Copyright 2021, Dataiku.

Sphinx theme provided by Read the Docs