Upgrading the R version used in Dataiku DSS

Question:

I have the R integration configured in Dataiku DSS, but I want to upgrade the underlying version of R that is being used. How should this be handled, and are there any best practices or considerations to be aware of? Also, is it possible to roll back this upgrade in case something goes wrong?

Upgrading R

When upgrading the base R version that is being used in a particular Dataiku DSS environment, this is generally a two-step process that includes:

  • Upgrading the R distribution itself on the server (typically using the system package manager, such as yum or apt depending on the Linux OS that is being used)

  • Rebuilding the default R environment and all managed code environments (i.e. reinstall all R packages for each environment)

The latter is needed because it is important to note that binary compatibility between different versions of R is not guaranteed, which can lead to issues if these R packages are not reinstalled and R environments not rebuilt. In particular, upgrading R from v3.4 to v3.5 has been known to cause issues and result in all installed packages being broken. One such example can be seen in this Github thread.

Rebuilding the default R environment and managed code environments

When rebuilding the default R environment (found under <dss_data_directory>/R.lib), you will generally want to rename or remove this directory and then re-run the install-r-integration script. For more detailed instructions about how this can be done, please refer to the Rebuilding the R environment subsection in our R integration documentation.

As for rebuilding managed code environments, this can be done through the Dataiku DSS user interface by navigating to the Administration > Code Envs tab, clicking on the code environment, and then selecting the “Rebuild env” option when updating the code environment.

../../../_images/Rebuilding-code-environment.png

Please note that if you have manually installed additional packages in the system’s library (as root), they will also need to be rebuilt, as mentioned in our documentation here.

Rolling back to a previous version of R

If you had saved the previous versions of the installed packages (as suggested above when renaming the <dss_data_dir>/R.lib directory instead of deleting it), rolling back should be as simple as reinstalling the previous version of R with the appropriate system package manager and then restoring the moved-away packages. Otherwise, these packages will need to be reinstalled again.

Troubleshooting

Why aren’t my R code recipes working anymore after upgrading or migrating my instance, even though the recipes are the same?

The first thing to check when this happens is whether or not the R version has been upgraded on your instance. If so, since R does not maintain binary compatibility, please make sure to first try rebuilding your default R environment and managed code environments (as R.lib in DSS and/or your code environments are likely outdated).

If that doesn’t work, then this likely means that there are faulty packages that have been installed at a global level and which need to be removed. This can be done by doing the following:

  • Run R from the terminal of your Dataiku DSS server.

./R/bin
  • Check for the problematic package(s).

find.package("TheBrokenPackage")
  • If a global path (like /usr/share or /usr/lib or /usr/local) is returned, then open an R shell as the root user and remove said package(s).

remove.packages(c("TheBrokenPackage"))
  • Repeat the previous steps until there are no more broken packages that remain.

Warning

Please make sure to replace TheBrokenPackage with the name of the actual package.

Where to learn more?

  • For more information about setting up the R integration in DSS, please see our reference documentation.

  • For more information about using R in DSS, we have more general documentation that can be found here.

  • For more hands-on demonstrations of using R in DSS, you may also want to check out the following tutorials.