Dataiku Knowledge
  • Discussions
    • Setup & Configuration
    • Using Dataiku DSS
    • Plugins & Extending Dataiku DSS
    • General Discussion
    • Job Board
    • Community Resources
  • Knowledge
    • Getting Started
    • Knowledge Base
    • Documentation
  • Academy
    • Course Catalog
    • Learning Paths
    • Resources
    • Academy Discussions
  • Community Programs
    • Upcoming User Events
    • Find a User Group
    • Past Events
    • Community Conundrums
    • Dataiku Neurons
    • Banana Data Podcast
  • What's New
  • Getting Started
    • Dataiku DSS - The Value Proposition
      • Your Path to Enterprise AI
      • Product Pillar: Dataiku DSS Architecture
      • Product Pillar: Inclusive Advanced Analytics
      • Product Pillar: AutoML
      • Product Pillar: AI Operationalization
      • Product Pillar: Performance Scalability
      • Product Pillar: Sustainable Governance & Processes
    • Dataiku DSS - Project Walkthrough
      • The NY Taxi Project through the AI Lifecycle
      • The AI Lifecycle: Data Acquisition
      • The AI Lifecycle: Data Exploration
      • The AI Lifecycle: Data Preparation
      • The AI Lifecycle: Experiment
      • The AI Lifecycle: Deploy
      • The AI Lifecycle: Orchestrate
    • From Excel To Dataiku DSS
      • Introduction
      • Data Cleaning
      • Using Formulas
      • Working with Dates
      • Removing Duplicates
      • Filtering Rows
      • Sampling Rows
      • Split a Dataset
      • Append Datasets
      • Joining Datasets
      • Aggregate and Pivot
      • Sorting Values
      • Top Values
  • Setup and Administration
    • Concept: Connections to SQL Databases
    • Remapping Connections in a DSS Instance
    • Working with MongoDB in DSS
    • Integration with Amazon Redshift
  • Data Preparation
    • Concept: Recipes in DSS
    • Preparing Data with Visual Recipes
      • Concept: Distinct Recipe
      • Concept: Group Recipe
      • Concept: Join Recipe
      • Concept: Pivot Recipe
      • The Pivot Recipe
      • Reshaping Data from Long to Wide Format
      • Creating Excel-Style Pivot Tables with the Pivot Recipe
      • Concept: Prepare Recipe
      • Concept: Date Handling in DSS
      • Concept: Formulas in DSS
      • Advanced Prepare Recipe Usage
      • How to reorder or hide the columns of a dataset
      • Concept: Filter Recipe
      • Concept: Sample Recipe
      • Concept: Sort Recipe
      • Concept: Split Recipe
      • Concept: Stack Recipe
      • Concept: Top N Recipe
      • Concept: Window Recipe
      • Visual Window Analytic Functions
      • Concept: Architecture Model for Databases
      • How to segment your data using statistical quantiles
    • Preparing Data with Code Recipes
      • Concept: SQL Recipe
      • Using PySpark in DSS
      • Using SparkR in DSS
    • Preparing Data with Plugin Recipes
      • Events Aggregator (Plugin)
    • Building Data Pipelines
      • Data Pipelines
      • Concept: Computation Engine
      • Concept: Jobs
      • Build Datasets
      • Where does it all happen?
      • How to enable SQL pipelines in the Flow
    • Repartitioning a non-partitioned dataset
  • Exploring Datasets
    • Connecting to and Exploring Data
      • Concept: Datasets in DSS
      • Concept: Partitioning
      • Concept: Connections
      • Concept: Schema
      • Concept: Storage Type and Meaning
      • Concept: Sampling
      • Concept: Analyze
      • Where can I see how many records are in my entire dataset?
      • Utilizing MS Access in Dataiku DSS
    • Charts
      • Concept: Charts
      • Concept: In-Database Charts
      • Paneled and Animated Charts
      • How to display non-aggregated metrics in charts
      • How to sort on a measure that is not displayed in charts?
    • Exploring Data in the Lab
      • Concept: The Lab
      • Concept: SQL Notebooks
  • Reporting & Insights
    • Dashboards
      • Concept: Dashboards in DSS
      • Cannot display a web content insight in a dashboard
    • R Markdown
      • Concept: R Markdown Reports
      • R Markdown Reports in DSS
    • Web Apps in Dataiku DSS
      • Bokeh Web Apps
      • Shiny Web Apps
      • HTML/JavaScript Web Apps
      • Use custom static files (JS, CSS) in a web app
      • How to Adapt a D3.js Template in a Web App
      • Use a React Frontend to Create a Webapp
      • How to display an image with Bokeh?
      • Upload to DSS in a web app
    • Concept: Visualization Plugins
  • Managing Your Work & Collaboration
    • Concept: Homepage
    • Concept: Project
    • Concept: Collaboration
    • Concept: Flow
    • How to copy a recipe in your Flow
    • Navigating Dataiku DSS with the right panel
    • Flow Zones
    • Tags
    • Using Wikis to Share Knowledge
    • Using Discussions to Communicate with Teammates
    • Git for Projects
    • Best Practices for Collaborating in Dataiku DSS
    • Best Practices to Improve Your Productivity
  • Analytics and Machine Learning
    • Interactive Visual Statistics
      • Concept: Statistics Worksheet
      • Concept: Statistics Card
      • Concept: Categorical and Numerical Variables
      • Concept: Factor and Response
      • Concept: Fit Curves and Distributions
      • Concept: Correlation Matrix
      • Concept: Principal Component Analysis (PCA)
      • Concept: Hypothesis Testing
      • Concept: Test Categories
      • Concept: Grouping Variable
      • Concept: Adjustment Method
      • Hands-On: Interactive Visual Statistics
    • Intro to Machine Learning
      • Concept Summary: Introduction to Machine Learning
      • Concept Summary: Predictive Modeling
      • Concept Summary: Model Validation
      • Concept: Model Evaluation
      • Concept Summary: Regression Algorithms
      • Concept Summary: Classification Algorithms
    • Visual Machine Learning
      • Machine Learning Basics
      • Interpreting Regression Models’ Outputs
      • How to identify clusters and name them
      • Deploy and Score a Model
      • Concept: Model Lifecycle Management
      • Concept Summary: Partitioned Models
      • Hands-On: Partitioned Models
      • How do I train a stratified or partitioned model?
      • Using MLLib in the Dataiku DSS interface
      • Why don’t the values in the Visual ML chart match the final scores for each algorithm?
      • In Visual ML, why am I getting the error “All values of the target are equal,” when they are not?
      • Compute a subpopulation analysis for white-box ML
    • Monitoring model drift with Dataiku DSS
    • Time Series
      • Concept Summary: Introduction to Time Series
      • Concept Summary: Time Series Data Types and Formats
      • Concept Summary: Time Series Components
      • Concept Summary: Objectives of Time Series Analysis
      • Concept Summary: Time Series Preparation
      • Concept Summary: Resampling
      • Concept Summary: Time Series Interval Extraction Pt 1
      • Concept Summary: Time Series Interval Extraction Pt 2
      • Concept Summary: Time Series Interval Extraction Pt 3
      • Concept Summary: Time Series Windowing Pt 1
      • Concept Summary: Time Series Windowing Pt 2
      • Concept Summary: Time Series Windowing Pt 3
      • Concept Summary: Time Series Extrema Extraction
      • Hands-On: Visualizing Time Series Data
      • Hands-On: Resampling Time Series Data
      • Hands-On: Interval Extraction
      • Hands-On: Time Series Windowing
      • Hands-On: Extrema Extraction
      • Forecasting Time Series Data with R and Dataiku DSS
      • Deep Learning for Time Series
      • How Dataiku DSS Handles and Displays Date & Time
    • Introduction to Deep Learning with Code
    • Natural Language Processing (NLP)
      • Concept Summary: Introduction to Natural Language Processing
      • Hands-On: Getting Started with NLP
      • Concept Summary: Preparing Text Data
      • Hands-On: Cleaning Text Data
      • Concept Summary: Handling Text Features for ML
      • Hands-On: Handling Text Features for ML
      • Sentiment Analysis in Dataiku DSS (Plugin)
      • Recognize authors style using the Gutenberg plugin
      • Natural Language Processing with Code
      • How to use Natural Language Toolkit (NLTK) in DSS
      • How to use spaCy models in Dataiku DSS
    • Image Classification with Visual Tools
      • Hands-On: Create Your Project and Prepare the Data
      • Hands-On: Install the Deep Learning Plugins
      • Concept Summary: Pre-Trained Models
      • Hands-On: Add a Pre-Trained Model to the Flow
      • Classify a Set of Test Images with the Pre-Trained Model
      • Hands-On: Transfer Learning to Retrain the Model
      • Hands-On: Analyze and Understand Your Model with Tensorboard
      • Hands-On: Object Detection
      • Wrap Up
    • Image Classification with Code
    • Geospatial Analytics
      • Creating Maps in DSS without code
      • Geographic Processing with DSS
      • Working with Shapefiles and US Census Data in DSS
    • Active Learning
      • Active Learning for classification problems
      • Active Learning for object detection problems
      • Help on Active Learning Webapp
      • Active Learning for object detection problems using Dataiku Apps
      • Active Learning for tabular data classification problems using Dataiku Apps
    • Reinforcement Learning
      • Introduction to Reinforcement Learning
      • Q-Learning
      • Deep Q-Learning
  • Advanced Code
    • Python and Dataiku DSS
      • Python in Dataiku DSS
      • Reading or writing a dataset with custom Python code
      • How-To: Use SQL from a Python Recipe in DSS
      • Sessionization in SQL, Hive, Pig and Python
      • Custom Python Models
      • Tuning XGBoost Models in Python
      • How to add a group to a Dataiku DSS Project using a Python Script
      • How to set a timeout for a particular scenario build step via a custom Python step?
      • How to use Azure AutoML from a Dataiku DSS Notebook
      • How to enable auto-completion in Jupyter Notebook
    • R and Dataiku DSS
      • Basics of R in Dataiku DSS
      • Mining Association Rules and Frequent Item Sets with R and DSS
      • Upgrading the R version used in Dataiku DSS
    • Work Environment
      • Using Jupyter Notebooks in DSS
      • How to Edit Dataiku Recipes and Plugins in Visual Studio Code
      • How to Edit Dataiku Recipes and Plugins in PyCharm
      • How to Edit Dataiku Recipes and Plugins in Sublime
      • How to Edit Dataiku Recipes in RStudio
      • Setting a Code Environment
      • Cloning a Library from a Remote Git Repository
      • Dataiku DSS Memory Optimization tips: Backend, Python/R, Spark jobs
    • Dataiku APIs
  • Operationalization
    • Automation
      • Automation
      • Reporting Scenario Activities
      • Model Lifecycle
      • How to Create a Google Chat Reporter
      • How to programmatically set email recipients in a “Send email” reporter using the API?
      • How to create a Jira issue automatically upon a DSS scenario execution failure
      • Can I control which datasets in my Flow get rebuilt during a scenario?
      • How to build missing partitions with a scenario
    • Flow Deployment
      • Deploying to Production
      • Packaging a Flow into a Bundle
      • Deploying a Bundle
      • Versioning a Flow
      • What’s Next
    • Deploying to Real-Time Scoring
    • Deploying multiple models to the API node for A/B testing
    • Dataiku Applications
      • Introduction to Dataiku Applications
      • Create a Visual Application
      • Create an Application-As-Recipe
      • Difference Between Webapps and Dataiku Applications
      • Dataiku Applications: Use Cases
    • Building a Jenkins pipeline for Dataiku DSS
    • Building a Jenkins pipeline for API services in Dataiku DSS
    • Variables
      • Variables in Flows, Webapps, and Dataiku Applications
      • Variables 101: Variables for Coders
  • Plugin Development & Management
    • Plugin Development
      • How to Create a Custom Recipe
      • How to Create a Custom Dataset
      • How to Create a Partitioned Custom Dataset
      • How to Create a Custom Web App
      • How to Create a Custom Machine Learning Algorithm
      • Setting Up Your Code Editor to Develop Dataiku Plugins
      • Plugin Naming Policies and Conventions
      • What’s Next
    • Plugin Management
      • Plugins in Dataiku DSS
      • Plugin Store Usage
      • Getting Started with the Dataiku DSS Plugin Store
      • Hands-On: Plugin Store
      • Sharing a Plugin as a Zip Archive
      • Managing Plugin Versions with Git
      • Cloning a Plugin from a Remote Git Repository
  • Governance
    • Concept: Global Search
    • Using global search in Dataiku DSS
    • Concept: Catalog
    • Data Governance with the GDPR Plugin
    • How to use project folders in Dataiku DSS
    • Why can’t I drag and drop a folder into Dataiku DSS?
    • How to duplicate a Dataiku DSS project
    • How to find out which users are logged onto the Dataiku DSS instance
    • Which activities in Dataiku DSS require that a user be added to the allowed_user_groups local Unix group?
  • Use Cases
    • Airport Traffic by US and International Carriers
    • Predictive Maintenance
    • Churn Prediction
    • Web Logs Analysis
    • Network Optimization
    • Bike Sharing Usage Patterns
    • Crawl budget prediction for enhanced SEO with the OnCrawl plugin
  • Dataiku Cloud offer
    • How to begin a Dataiku Cloud free trial
    • How to invite users to your Dataiku Cloud
    • How to create a Snowflake connection on Dataiku Cloud
    • Starting a Dataiku Cloud trial from Snowflake Partner Connect
 
Dataiku Academy
You are viewing the Knowledge Base for version 8.0 of DSS.
  • Docs »
  • Data Preparation »
  • Preparing Data with Visual Recipes »
  • Advanced Prepare Recipe Usage

Advanced Prepare Recipe Usage¶

Although it is easy to use, the Prepare recipe is also packed with powerful functionality that may not be immediately obvious.

The following articles build on the material already introduced in the Basics Courses.

The Prepare recipe is the focus of this section, but most of these materials also apply to the visual analysis of the Lab, which can be deployed to the Flow as a Prepare recipe.

Articles¶

  • Handling Decimal Notations
  • Enriching Web Logs
  • Applying Prepare Steps to Multiple Columns
  • Performing Joins in the Prepare Recipe
  • Become a Master of Dataiku DSS Formulas
  • Custom Python functions in the Prepare Recipe
  • How to standardize text fields using fuzzy values clustering
  • How to fill empty cells of a column with the value of the corresponding row from another column
  • How to remove scientific notation in a column
  • How to pad a number with leading zeros
  • Safe sums across columns in Dataiku DSS Formulas
  • In a formula, how to check if a variable belongs to a set of values
  • How to copy-paste Prepare recipe steps
  • Dealing with Accounting-style negative numbers
Next Previous

© Copyright 2021, Dataiku.

Sphinx theme provided by Read the Docs