Upgrading the R version used in Dataiku¶
I have the R integration configured in Dataiku, but I want to upgrade the underlying version of R that is being used. How should this be handled, and are there any best practices or considerations to be aware of? Also, is it possible to roll back this upgrade in case something goes wrong?
When upgrading the base R version that is being used in a particular Dataiku environment, this is generally a two-step process that includes:
Upgrading the R distribution itself on the server (typically using the system package manager, such as yum or apt depending on the Linux OS that is being used)
Rebuilding the default R environment and all managed code environments (i.e. reinstall all R packages for each environment)
The latter is needed because it is important to note that binary compatibility between different versions of R is not guaranteed, which can lead to issues if these R packages are not reinstalled and R environments not rebuilt. In particular, upgrading R from v3.4 to v3.5 has been known to cause issues and result in all installed packages being broken. One such example can be seen in this Github thread.
Rebuilding the default R environment and managed code environments¶
When rebuilding the default R environment (found under
<dss_data_directory>/R.lib), you will generally want to rename or remove this directory and then re-run the install-r-integration script. For more detailed instructions about how this can be done, please refer to the Rebuilding the R environment subsection in our R integration documentation.
As for rebuilding managed code environments, this can be done through the Dataiku user interface by navigating to the Administration > Code Envs tab, clicking on the code environment, and then selecting the “Rebuild env” option when updating the code environment.
Please note that if you have manually installed additional packages in the system’s library (as root), they will also need to be rebuilt, as mentioned in the product documentation.
Rolling back to a previous version of R¶
If you had saved the previous versions of the installed packages (as suggested above when renaming the
<dss_data_dir>/R.lib directory instead of deleting it), rolling back should be as simple as reinstalling the previous version of R with the appropriate system package manager and then restoring the moved-away packages. Otherwise, these packages will need to be reinstalled again.
Why aren’t my R code recipes working anymore after upgrading or migrating my instance, even though the recipes are the same?
The first thing to check when this happens is whether or not the R version has been upgraded on your instance. If so, since R does not maintain binary compatibility, please make sure to first try rebuilding your default R environment and managed code environments (as R.lib in Dataiku and/or your code environments are likely outdated).
If that doesn’t work, then this likely means that there are faulty packages that have been installed at a global level and which need to be removed. This can be done by doing the following:
Run R from the terminal of your Dataiku server.
Check for the problematic package(s).
If a global path (like /usr/share or /usr/lib or /usr/local) is returned, then open an R shell as the root user and remove said package(s).
Repeat the previous steps until there are no more broken packages that remain.
Please make sure to replace TheBrokenPackage with the name of the actual package.
Where to learn more?¶
For more information about setting up the R integration in Dataiku, please visit R integration.
For more information about using R in Dataiku, you can find more general content on the documentation page dedicated to R.
For more hands-on demonstrations of using R in Dataiku, you may also want to check out the following tutorials.