Knowledge
Discussions
Setup & Configuration
Using Dataiku DSS
Plugins & Extending Dataiku DSS
General Discussion
Job Board
Community Resources
Product Ideas
Knowledge
Getting Started
Knowledge Base
Documentation
Academy
Quick Start Programs
Learning Paths
Certifications
Course Catalog
Academy Discussions
Community Programs
Upcoming User Events
Find a User Group
Past Events
Community Conundrums
Dataiku Neurons
Banana Data Podcast
What's New
Getting Started
Concept: The Value Proposition of Dataiku
A Dataiku Project Walkthrough
Business Analyst Quick Start
AI Consumer Quick Start
Data Scientist Quick Start
Data Engineer Quick Start
Excel to Dataiku Quick Start
Setup and Administration
Administrator’s Guide
Deploy Dataiku
Dataiku Elastic AI Stack: The Full Fleet Architecture
Impact of Modifying Instance Templates and Settings
Deploying a Dataiku Instance to Cloud Stacks on AWS
Modifying Instance Templates and Virtual Networks
Managing Dataiku Instances in Fleet Manager (AWS)
Deploying a Dataiku Instance to Cloud Stacks on Azure
Modifying Instance Templates and Virtual Networks
Managing Dataiku Instances in Fleet Manager (Azure)
Configure Dataiku
Managing Your Dataiku DSS License File
Security Model Overview
Build Your Security Model - DSS User Authentication
Build Your Security Model - User Groups
Build your Security Model - Per-Resource Group Permissions
Build Your Security Model - Connections - Usage Parameters
Build Your Security Model - Global vs Per User Credentials
Build Your Security Model - Connections - Metastore
Using AWS AssumeRole with an S3 Connection to Persist Datasets
Preferred Connections and Format for Dataset Storage
How to Work with the DSS Metastore Catalog
Create and Manage Code Environments
Troubleshoot Dataiku
Diagnosing Performance Issues in Dataiku
Administrative Best Practices
How to Automate Project Cleaning and Maintenance
How-To: Set Up Dataiku Workspaces
Concept: Architecture Model for Databases
Concept: Connections to SQL Databases
Hands-On Tutorial: Remapping Connections in a Dataiku Instance
Working with MongoDB in Dataiku
Integration with Amazon Redshift
How to Leverage Compute Resource Usage Data
Data Preparation
Concept: Recipes in Dataiku
Preparing Data with Visual Recipes
Concept: Distinct Recipe
Concept: Group Recipe
Concept: Join Recipe
Concept: Pivot Recipe
Concept: Prepare Recipe
Concept: Date Handling in Dataiku
Concept: Formulas in Dataiku
Concept: Filter Recipe
Concept: Sample Recipe
Concept: Sort Recipe
Concept: Split Recipe
Concept: Stack Recipe
Concept: Top N Recipe
Concept: Window Recipe
Advanced Prepare Recipe Usage
Handling Decimal Notations
Enriching Web Logs
Applying Prepare Steps to Multiple Columns
Performing Joins in the Prepare Recipe
Become a Master of Dataiku Formulas
Custom Python functions in the Prepare Recipe
How to standardize text fields using fuzzy values clustering
How to fill empty cells of a column with the value of the corresponding row from another column
How to Remove Scientific Notation in a Column
How to pad a number with leading zeros
Safe sums across columns in Dataiku DSS Formulas
In a Formula, how to check if a variable belongs to a set of values?
How to copy-paste Prepare recipe steps
Dealing with Accounting-style negative numbers
How-To: Filter and Process Dates Interactively
How-To: Extract Patterns With the Smart Pattern Builder
Hands-On Tutorial: Visual Logic for Data Preparation
The Pivot Recipe
Hands-On Tutorial: Reshaping Data from Long to Wide Format
Hands-On Tutorial: Creating Excel-Style Pivot Tables with the Pivot Recipe
Hands-On Tutorial: Join Datasets
Hands-On Tutorial: Window Recipe
Hands-On Tutorial: Window Recipe (Deep Dive)
Hands-On Tutorial: Fuzzy Join Recipe
How to reorder or hide the columns of a dataset
How to segment your data using statistical quantiles
Data Pipelines & Computation Engines
Concept: Computation Engine
Concept: Jobs
Concept: Dataset Building Strategies
Concept: Where does the computation happen?
Hands-On Tutorial: Data Pipelines
How to Enable SQL Pipelines in the Flow
From Excel To Dataiku
Data Cleaning
Using Formulas
Working with Dates
Removing Duplicates
Filtering Rows
Sampling Rows
Split a Dataset
Append Datasets
Joining Datasets
Aggregate and Pivot
Sorting Values
Top Values
Data Exploration
Concept: Datasets in DSS
Concept: Connections
A Primer on Connecting to Data Sources
Concept: Schema
Concept: Storage Type
Concept: Meaning
Concept: Sampling
Concept: Analyze
Charts
Concept: Charts
Concept: In-Database Charts
Hands-On Tutorial Paneled and Animated Charts
Hands-On Tutorial: Visualization Enhancements
Hands-On Tutorial: Charts and Pivot Tables
Hands-On Tutorial: Dashboard Management
How to display non-aggregated metrics in charts
How to sort on a measure that is not displayed in charts?
Concept: The Lab
How to Export Data from Filtered Results
Where can I see how many records are in my entire dataset?
Utilizing MS Access in Dataiku DSS
Reporting & Insights
Dashboards
Concept: Dashboards
Cannot display a web content insight in a dashboard
Hands-On Tutorial: What-If Analysis With Interactive Scoring
Webapps in Dataiku
Hands-On Tutorial: Dash Webapp
Hands-On Tutorial: Bokeh Webapp
Hands-On Tutorial: Shiny Webapp
Hands-On Tutorial: Standard Webapp
Hands-On Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
Hands-On Tutorial: Adapt a D3.js Template in a Webapp
Concept: Webapps in Dataiku
Use Custom Static Files (Javascript, CSS) in a Webapp
Use a React Frontend to Create a Webapp
How-To: Display an Image With Bokeh
Upload to Dataiku DSS in a Webapp
Download from a Dataiku DSS Webapp
Static Insights in Dataiku
Concept: Static Insights in Dataiku
Hands-On Tutorial: Static Insights
Dataiku Applications
Concept: An Introduction to Dataiku Applications
Concept: The Difference Between Webapps and Dataiku Applications
Use Cases of Dataiku Applications
Hands-On Tutorial: Create a Visual Dataiku Application
Hands-On Tutorial: Create a Dataiku Application-As-Recipe
R Markdown Reports
Concept: R Markdown Reports
Hands-On Tutorial: R Markdown Reports in Dataiku
Concept: Visualization Plugins
Managing Work & Collaboration
Concept: Homepage
Concept: Project
Concept: Collaboration
Concept: Flow
Navigating Dataiku with the right panel
Hands-On Tutorial: Tags
Using Wikis to Share Knowledge
How-To: Export a Wiki to PDF
Using Discussions to Communicate with Teammates
Hands-On Tutorial: Dataiku Workspaces
How to copy a recipe in your Flow
Git for Projects
Flow Views & Actions
Flow Views: Zones, Tags, & More
Hands-On Tutorial: Flow Zones
Hands-On Tutorial: Flow Zones, Tags, & More Flow Views
Concept: Schema Propagation & Consistency Checks
Concept: Connection Changes & Flow Item Reuse
Concept: Dataset Building Strategies
Hands-On Tutorial: Perform Flow Actions
How-To: Feature Store
How-To: Seamless Sharing
Best Practices for Collaborating in Dataiku DSS
Best Practices to Improve Your Productivity
Analytics & Machine Learning
Interactive Visual Statistics
Concept: Statistics Worksheet
Concept: Statistics Card
Concept: Categorical and Numerical Variables
Concept: Factor and Response
Concept: Fit Curves and Distributions
Concept: Correlation Matrix
Concept: Principal Component Analysis (PCA)
Concept: Hypothesis Testing
Concept: Test Categories
Concept: Grouping Variable
Concept: Adjustment Method
Hands-On Tutorials: Interactive Visual Statistics
Hands-On: Explore the Interactive Statistics Interface
Hands-On: Perform Univariate and Bivariate Analysis
Hands-On: Fit Univariate and Bivariate Distributions
Hands-On: Model the Relationship Between Two Variables
Hands-On: Create a Correlation matrix
Hands-On: Analyze Effects of Dimensionality Reduction
Hands-On: Perform Statistical Tests
How-To: Perform Statistical Analysis on Time Series Data
Intro to Machine Learning
Concept: Introduction to Machine Learning
Concept: Predictive Modeling
Concept: Model Validation
Concept: Model Evaluation
Concept: Regression Algorithms
Concept: Classification Algorithms
Concept: Clustering Algorithms
Visual Machine Learning
Machine Learning Basics
Concept: Preparing a Dataset for Machine Learning
Concept: Quick Models
Concept: Design Tab Overview
Hands-On: Create the Model
Concept: Result Tab Overview
Concept: Model Summary Overview
Hands-On: Evaluate the Model
Concept: Feature Handling
Concept: Review the Design
Concept: Algorithms and Hyperparameters
Hands-On: Tune the Model
Concept: Explainable AI
Concept: Partial Dependence
Concept: Subpopulation Analysis
Concept: Individual Explanations
Concept Summary: Interactive Scoring
Hands-On: Explain Your Model
Scoring Basics
Concept: Deploy the Model
Hands-On Tutorial: Deploy the Model
Concept: Scoring Data
Hands-On Tutorial: Scoring Data
Concept: Model Lifecycle Management
Regression Models
Cluster Models
Advanced Visual Machine Learning
How To: Use Visual ML Diagnostics
How To: Use Visual ML Assertions
Hands-On Tutorial: Model Fairness Report
How-To: Distributed Hyperparameter Search
How-To: Set up Interactive Scoring for a Dashboard Consumer
Hands-On Tutorial: What-If Analysis With Interactive Scoring
Monitoring model drift with Dataiku
How-To: Model Comparisons and Model Evaluation Stores
How-To: “What-If Accelerators” Counterfactual and Actionable Recourse
Hands-On Tutorial: Visual ML Features
Partitioned Models
Concept: Partitioned Models
Hands-On Tutorial: Partitioned Models
How do I train a stratified or partitioned model?
Custom Models in Visual ML
Custom Preprocessing in Visual ML
Custom Modeling in Visual ML
Hands-On Tutorial: Custom Preprocessing in the Visual ML Tool
Hands-On Tutorial: Custom Modeling in the Visual ML Tool
Tuning XGBoost Models in Python
Using MLlib with Dataiku
Why don’t the values in the Visual ML chart match the final scores for each algorithm?
In Visual ML, why am I getting the error “All values of the target are equal” when they are not?
Compute a subpopulation analysis for white-box ML
Events Aggregator (Plugin)
Time Series
Time Series Basics
Concept: Introduction to Time Series
Concept: Time Series Data Types and Formats
Concept: Time Series Components
Concept: Objectives of Time Series Analysis
How-To: Perform Statistical Analysis on Time Series Data
Time Series Preparation
Concept: Time Series Preparation
Concept: Resampling
Concept: Time Series Interval Extraction Pt 1
Concept: Time Series Interval Extraction Pt 2
Concept: Time Series Interval Extraction Pt 3
Concept: Time Series Windowing Pt 1
Concept: Time Series Windowing Pt 2
Concept: Time Series Windowing Pt 3
Concept: Time Series Extrema Extraction
Hands-On Tutorial: Visualizing Time Series Data
Hands-On Tutorial: Resampling Time Series Data
Hands-On Tutorial: Interval Extraction
Hands-On Tutorial: Time Series Windowing
Hands-On Tutorial: Extrema Extraction
Time Series Modeling and Forecasting
Hands-On Tutorial: Forecasting Time Series (Visual ML Interface)
Hands-On Tutorial: Forecasting Time Series (Plugin)
Hands-On Tutorial: Forecasting Time Series Data with R and Dataiku
Hands-On Tutorial: Deep Learning for Time Series
Concept: How Dataiku Handles and Displays Date & Time
Natural Language Processing (NLP)
Concept: Introduction to Natural Language Processing
Concept: The Challenges of Natural Language Processing (NLP)
Concept: Cleaning Text Data
Concept: Handling Text Features for ML
Hands-On Tutorial: Getting Started with NLP
Hands-On Tutorial: Cleaning Text Data
Hands-On Tutorial: Handling Text Features for ML
Hands-On Tutorial: Deep Learning for Sentiment Analysis
How to Use the Python Natural Language Toolkit (NLTK) in Dataiku
How to use spaCy models in Dataiku
Hands-On Tutorial: Sentiment Analysis in Dataiku (Plugin)
Hands-On Tutorial: Recognize author’s style using the Gutenberg plugin
Image Classification
Image Classification with Visual Tools
Hands-On Tutorial: Image Classification with the Deep Learning on Images Plugin
Hands-On Tutorial: Use the Object Detection in Images Plugin
Image Classification with Code / Deep Learning for Images
Geospatial Analytics
Hands-On Tutorial: Creating Maps in Dataiku without Code
Hands-On Tutorial: Geographic Processing with Dataiku
Hands-On Tutorial: Working with Shapefiles and US Census Data in Dataiku
Hands-On Tutorial: Geo Join
Introduction to Deep Learning with Code
Active Learning
Active Learning for classification problems
Active Learning for object detection problems
Help on Active Learning Webapp
Active Learning for object detection problems using Dataiku Apps
Active Learning for tabular data classification problems using Dataiku Apps
Reinforcement Learning
Introduction to Reinforcement Learning
Q-Learning
Deep Q-Learning
Code
Getting Started with Code in Dataiku
Concept: Code Notebooks in Dataiku
Concept: Code Recipes in Dataiku
Concept: Code Environments in Dataiku
Concept: External IDE Integrations
Hands-On Tutorial: Code Notebooks
Hands-On Tutorial: Code Recipes
Hands-On Tutorial: Code Environments
Python and Dataiku
Hands-On Tutorial: The Basics of Python in Dataiku
Reading or writing a dataset with custom Python code
Hands-On Tutorial: Use SQL from a Python Recipe in Dataiku
Hands-On Tutorial: Sessionization in SQL, Hive, Python, and Pig
Custom Python Models
Tuning XGBoost Models in Python
How to add a group to a Dataiku DSS Project using a Python Script
How to set a timeout for a particular scenario build step via a custom Python step?
How to use Azure AutoML from a Dataiku DSS Notebook
How to enable auto-completion in Jupyter Notebook
Hands-On Tutorial: Using PySpark in Dataiku
How to Export Preprocessed Data
SQL and Dataiku
Integration with SQL Databases
Prerequisites for SQL Integration
Concept: Connections to SQL Databases
Hands-On Tutorial: Configure the Connection Between Dataiku and PostgreSQL
Usage of SQL and Dataiku
Concept: Architecture Model for Databases
Hands-On Tutorial: Sync Recipe
Hands-On Tutorial: Prepare Recipe for Loading a Database
Concept: SQL Recipe
Hands-On Tutorial: Create a New Dataset With an SQL Query Recipe
Hands-On Tutorial: Using Visual Recipes to Perform In-database Operations
Concept: In-Database Charts
Hands-On Tutorial: In-Database Charts
Concept: SQL Notebooks
Hands-On Tutorial: SQL Notebooks
R and Dataiku
Hands-On Tutorial: The Basics of R in Dataiku
Hands-On Tutorial: Dataiku DSS for R Users (Advanced)
Hands-On Tutorial: Shiny Webapp
R Markdown Reports
Concept: R Markdown Reports
Hands-On Tutorial: R Markdown Reports in Dataiku
Hands-On Tutorial: Mining Association Rules and Frequent Item Sets with R and Dataiku
Upgrading the R version used in Dataiku
Hands-On Tutorial: Using SparkR in Dataiku
Shared Code including Git in Dataiku
Concept: Intro to Shared Code
Concept: Shared Code Libraries
Concept: Importing Code from a Remote Git Repository
Cloning a Library from a Remote Git Repository
Concept: Code Samples
How-To: Import a Notebook from GitHub
Hands-On Tutorial: Shared Code
Work Environment
How to Edit Dataiku Projects and Plugins in Visual Studio Code
How to Edit Dataiku Projects and Plugins in PyCharm
How to Edit Dataiku Recipes and Plugins in Sublime
How to Edit Dataiku Recipes in RStudio
Hands-On Tutorial: Using Jupyter Notebooks in Dataiku
Hands-On Tutorial: My First Code Studio
How to Edit a Code Recipe Using Code Studios
Setting a Code Environment
Memory Optimization Tips: Backend, Python/R, Spark jobs
Dataiku APIs
Concept: APIs in Dataiku
Concept: The dataiku Package
Concept: The Public API
Hands-On Tutorial: The Public API in Dataiku
Concept: APIs Outside Dataiku
Webapps in Dataiku
Hands-On Tutorial: Dash Webapp
Hands-On Tutorial: Bokeh Webapp
Hands-On Tutorial: Shiny Webapp
Hands-On Tutorial: Standard Webapp
Hands-On Tutorial: Create an HTML/JavaScript Webapp to Draw the San Francisco Crime Map
Hands-On Tutorial: Adapt a D3.js Template in a Webapp
Concept: Webapps in Dataiku
Use Custom Static Files (Javascript, CSS) in a Webapp
Use a React Frontend to Create a Webapp
How-To: Display an Image With Bokeh
Upload to Dataiku DSS in a Webapp
Download from a Dataiku DSS Webapp
Static Insights in Dataiku
Concept: Static Insights in Dataiku
Hands-On Tutorial: Static Insights
Managed Folders
Concept: Managed Folders
Hands-On Tutorial: Managed Folders
Operationalization
Automation
Concept: Metrics & Checks
Concept: Scenarios
Concept: Custom Metrics, Checks & Scenarios
Model Lifecycle
Automation Quick Start
Hands-On Tutorial: Automation with Metrics, Checks & Scenarios
Reporting Scenario Activities
How to Create a Google Chat Reporter
How to programmatically set email recipients in a “Send email” reporter using the API?
How to create a Jira issue automatically upon a DSS scenario execution failure
Can I control which datasets in my Flow get rebuilt during a scenario?
How to build missing partitions with a scenario
MLOps Practitioner Learning Path
Production Concepts
MLOps: Definition, Challenges, and Main Principles
Six Components of Model Development that Impact MLOps
How the Dataiku Architecture Supports MLOps
Machine Learning (ML) Model Packages
How to Gain Control of MLOps Processes
Monitoring Model Performance and Drift in Production
Govern
Why Monitoring and Feedback is a Crucial Step in the AI Project Lifecycle
Technical Prerequisites for MLOps Tutorials
Preparing for Production
Automation Best Practices
Pipeline Optimization Best Practices
Documenting Your Project Workflow
Hands-On Tutorial: Automation for a Production Environment
Projects in Production
Concept: Preparing the Automation Node
Concept: Batch Deployment
Hands-On Tutorial: Batch Deployment
Hands-On Tutorial: Monitoring Projects in Production
Hands-On Tutorial: Automatically Updating Project Deployments
Real-Time APIs
Concept: Real-Time API Deployment
Concept: API Query Endpoints
Concept: API Query Enrichments
Hands-On Tutorial: Create Endpoint and Test Queries
Concept: API Deployer
Hands-On Tutorial: Deploy Real-Time API Service
Hands-On Tutorial: Manage Multiple Versions of an API Service
Monitor Output of API Endpoints
Dataiku Applications
Concept: An Introduction to Dataiku Applications
Concept: The Difference Between Webapps and Dataiku Applications
Use Cases of Dataiku Applications
Hands-On Tutorial: Create a Visual Dataiku Application
Hands-On Tutorial: Create a Dataiku Application-As-Recipe
Hands-On Tutorial: Building your Feature Store in Dataiku
Building CI/CD pipelines for Dataiku DSS
Building a Jenkins pipeline for API services in Dataiku DSS
Building a Jenkins pipeline for Dataiku DSS with Project Deployer
Building an Azure Pipeline for Dataiku DSS with Project Deployer
Building a Jenkins pipeline for Dataiku DSS without Project Deployer
Variables
Variables in Flows, Webapps, and Dataiku Applications
Concept: Variables 101
A Look at Coding with Variables
Concept Summary: Defining Variables
Concept Summary: Using Variables in a Code Recipe
Concept Summary: Modifying the Value of Variables
Hands-On Tutorial: Variables for Coders
Partitioning
Concept: Partitioning
How Partitioning Adds Value
Partitioned Datasets
Running Jobs with Partitioned Datasets
Redispatching and Collecting Partitions
Partitioning in a Scenario
Creating a Partitioned Output by Specifying a Pattern
Hands-On Tutorial: Advanced Partitioning: File-Based Using Partition Redispatch
Hands-On Tutorial: Column-Based Partitioning
Hands-On Tutorial: Advanced Partitioning: Scenarios
Hands-On Tutorial: Repartition a Non-partitioned Dataset
Plugins
Plugin Development
Plugin Development (Concepts and Tutorials)
Concept: What Are Development Plugins?
Concept: Developing Plugins
Concept: Git Integration for Plugins
Hands-On Tutorial: Plugin Development
Examples of Plugin Component Development
How to Create a Custom Recipe
How to Create a Custom Dataset
How to Create a Partitioned Custom Dataset
How to Create a Custom Webapp
How to Create a Custom Machine Learning Algorithm
Setting Up Your Code Editor to Develop Dataiku Plugins
Plugin Naming Policies and Conventions
What’s Next?
Plugin Management
Concept: Plugins in Dataiku
Concept: Plugin Store Usage
Getting Started with the Dataiku DSS Plugin Store
Sharing a Plugin as a Zip Archive
Hands-On Tutorial: Plugin Store
Managing Plugin Versions with Git
Cloning a Plugin from a Remote Git Repository
Governance
Introducing Dataiku Govern
Using Govern
Governable Items
Create a Governance Layer
Model and Bundle Registries
Business Initiatives
Govern Item Pages
Workflows and Project Qualification
Governed Projects
Reviews and Sign-offs
Model Maintenance
Concept: Catalog and Global Search
Concept: Global Search in Dataiku
Hands-On Tutorial: Data Governance with the GDPR Plugin
Tips: Use Project Folders in Dataiku
FAQ: Why can’t I drag and drop a folder into Dataiku?
How-To: Duplicate a Dataiku Project
How-To: Flow Document Generator
Code Sample: Find out which users are logged onto the Dataiku instance
FAQ: Which activities in Dataiku require that a user be added to the
allowed_user_groups
local Unix group?
Use Cases
Airport Traffic by US and International Carriers
Predictive Maintenance in the Manufacturing Industry
Churn Prediction
Web Logs Analysis
Network Optimization
Bike Sharing Usage Patterns
Crawl budget prediction for enhanced SEO with the OnCrawl plugin
A/B Testing for Event Promotion
Facies Classification
Business Solutions
Distribution Spatial Footprint
RFM-Enriched Customer Lifetime Value
Market Basket Analysis
Product Recommendation
RFM Segmentation
Customer Satisfaction Reviews
Demand Forecast
News Sentiment Stock Alert System
Interactive Document Intelligence for ESG
AML Alerts Triage
Insurance Claims Modeling
Credit Card Fraud
Customer Segmentation for Banking
Credit Scoring
FX P&L Impact Modeling
Financial Forecasting
Process Mining
Real Estate Pricing
Optimizing Omnichannel Marketing in Pharma
Drug Repurposing through Graph Analytics
Pharmacovigilance
Social Determinants of Health
Factories Electricity & CO2 Emissions Forecasting
Production Quality Control
Delivery Dock Optimization
Batch Performance Optimization
How to Leverage Compute Resource Usage Data
Dataiku Online
Manage Dataiku Online from the Launchpad
How to Begin a Dataiku Online Free Trial
Start a Dataiku Online Trial from Snowflake Partner Connect
Connect to Your Data on Dataiku Online
Invite Users to Your Dataiku Online Space
Use the Automation Node on Dataiku Online
Use the API Node on Dataiku Online
Work With Python on Dataiku Online
Add Plugins to Your Dataiku Online Space
Install Business Solutions on Dataiku Online
Obtain Support on Dataiku Online
Compute and Resource Quotas on Dataiku Online
Setup Single Sign On (SSO)
Dataiku Academy
You are viewing the Knowledge Base for version
11
of Dataiku.
Docs
»
Analytics & Machine Learning
»
Image Classification
Image Classification
¶
Learn how to classify images with Dataiku using visual or more customized code methods.
Topics
¶
Image Classification with Visual Tools
Image Classification with Code / Deep Learning for Images