Work With Python on Dataiku Online

Dataiku Online allows you to work extensively with Python. To do so you can:

  • use the DSS builtin environment (as you would normally find on a Dataiku instance);

  • use one of the three additional Python environments (“datascience”, “dash”, and “Time Series”), which are available by default;

  • create a custom one and choose the packages to include.

Note

You have to be space-admin to create or manage your Python environments.

How to Create a Python Environment

  • First, navigate to the Launchpad to get started, and open the Python environments tab.

../../_images/add-a-code-env.png

  • Click on the button Add a code environment.

  • Choose the version of Python, and enter the packages required (one package per line). Include specific versions where possible to ensure the stability of the code environment. You can later add packages to existing code environments if needed.

../../_images/python1.png

Packages to Install

To be available, a package must be publicly available on pip. Installation through wheel files is not possible.

Note

Some Python packages may require additional system dependencies. Dataiku Online doesn’t support these dependencies. Please contact support for more information.

Sets of Packages

When creating a code environment, Dataiku offers sets of packages commonly used together for certain kinds of machine learning tasks.

  • Visual Machine Learning (scikit-learn, XGBoost, LightGBM)

scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
  • Visual Machine Learning with Bayesian search (scikit-learn, XGBoost, LightGBM, scikit-optimize)

scikit-optimize>=0.7,<0.8
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
  • Visual Machine Learning with sentence embedding (scikit-learn, XGBoost, LightGBM, sentence-transformers)

sentence-transformers>=2.1,<2.3
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
  • Visual Deep Learning: Tensorflow. CPU, and GPU with CUDA11.2 + cuDNN 8.1

tensorflow>=2.6.2,<3.0
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
statsmodels>=0.10,<0.11
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
pillow==6.2.2
cloudpickle>=1.3,<1.6
h5py==3.1.0

How to Manage your Python Environments

You can access your Python environments in the Python Environments tab of your Launchpad. Then, you can access the details of the Data Science and Dash environments or edit your custom ones by clicking to the three points icon next to each package.

The Project Deployer manages the deployment of Python Environments on the Automation node. The ones required for a project will be created during the bundle creation. For more information, see the product documentation on how to deploy bundles with the Project Deployer.