Work With Python on Dataiku Cloud¶
Dataiku has a deep integration with Python that allows you to write Python code in many places of the platform. Explore resources here for working with Python specifically with Dataiku Cloud.
Reference | Python environment management on Dataiku Cloud¶
You can find your Python environments in the Code Environments panel of your Launchpad. From there, you can access the details of any default or custom environment by clicking the three dots icon next to each environment.
You have three options for choosing a Python environment on Dataiku Cloud:
use the Dataiku builtin environment (as you would normally find on a Dataiku instance);
use one of the three additional Python environments (“datascience”, “dash”, and “Time series”), which are available by default;
create a custom one and choose the packages to include.
Note
You have to be space admin to create or manage your Python environments.
Reference | Package presets for custom Python environments¶
For custom Python environments, Dataiku offers makes it easy to add sets of packages commonly used together for certain kinds of machine learning tasks.
Visual Machine Learning (scikit-learn, XGBoost, LightGBM)
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
Visual Machine Learning with Bayesian search (scikit-learn, XGBoost, LightGBM, scikit-optimize)
scikit-optimize>=0.7,<0.8
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
Visual Machine Learning with sentence embedding (scikit-learn, XGBoost, LightGBM, sentence-transformers)
sentence-transformers>=2.1,<2.3
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
xgboost==0.82
lightgbm>=3.2,<3.3
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
cloudpickle>=1.3,<1.6
statsmodels>=0.10,<0.11
Visual Deep Learning: Tensorflow. CPU, and GPU with CUDA11.2 + cuDNN 8.1
tensorflow>=2.6.2,<3.0
scikit-learn>=0.20,<0.21
scipy>=1.2,<1.3
statsmodels>=0.10,<0.11
jinja2>=2.10,<2.11
MarkupSafe<2.1.0
itsdangerous<2.1.0
flask>=1.0,<1.1
pillow==6.2.2
cloudpickle>=1.3,<1.6
h5py==3.1.0
How-to | Create a custom Python code environment¶
As space admin, from the Launchpad of your space, navigate to the Code Envs panel.
Click Add a Python Code Environment.
Enter a name for the new code env, and choose the version of Python.
Enter the packages required (one package per line). Include specific versions where possible to ensure the stability of the code environment. You can later add packages to existing code environments if needed.
Note
You can also click Package Presets > Append to add groups of packages commonly used together for various types of machine learning tasks.
FAQ | What packages are available for installation in a custom Python environment?¶
A package must be publicly available on pip to be installed in a custom Python environment. Installation through wheel files is not possible.
Note
Some Python packages may require additional system dependencies. Dataiku Cloud doesn’t support these dependencies. Please contact support for more information.
Reference | Python environments on the Automation node¶
As with R environments, the Project Deployer manages the deployment of Python environments on the Automation node. The ones required for a project will be created during the bundle creation.
Note
For more information, see the reference documentation on how to deploy bundles with the Project Deployer.