Troubleshoot | The Dataiku UI is slow to load for all users#

  • What: The DSS UI is taking about a minute to load across all projects and different parts of DSS. This is not specific to viewing a specific project Flow or attempting to view a specific dataset; it is slow across the board for all users.

  • Who: Multiple users have reported that DSS is taking a minute or more to load for them.

  • When: We noticed that this issue started yesterday. Interestingly, we did restart DSS this morning and everything seemed fine again. However, about an hour ago, we started to experience slow load times again.

  • Where: This issue is with the DSS UI, so the issue is likely restricted to the DSS server and not related to externally processed jobs.

Troubleshooting steps#

This issue is impacting all users on the DSS server, so it’s probably not specific to a user’s environment or an individual project. At this point, you can perform some initial investigations of the DSS server. We can break this down into a couple of different steps:

  • Check resource usage on the DSS server. It’s always good to do some brief checks to see if you might be facing a resource issue on your server.

Run the following checks to see what processes are running on your DSS server and how much space you have available:

  • ps auxf

  • top

  • df -h

  • Do some initial investigation of the DSS backend logs.

DSS has detailed logging which can help you diagnose what might be happening. As a general tip, you can tail the backend logs if you are ever trying to identify what’s currently impacting the DSS server at a time of slowness:

tail -f <DATA_DIR>/run/backend.log

If you are seeing a slow UI issue specifically, you also might want to check if you are running into a garbage collection issue on the server. A quick way to do this is by running the following command:

grep -v JEK | grep -v FEK | grep “Full GC” <DATA_DIR>/run/backend.log

It’s common to see entries returned with this command, even if everything is fine on your DSS server. However, it is a problem if you are seeing that each entry consumes several seconds, as this will create a lag in the UI. For example, the following shows garbage collection entries that each take about 30 seconds:

36401.998: [Full GC (Allocation Failure)  12268M->12266M(12288M), 29.3987391 secs] 36431.480: [Full GC (Allocation Failure)  12268M->12266M(12288M), 39.1208729 secs] 36470.651: [Full GC (Allocation Failure)  12268M->12266M(12288M), 39.0166883 secs]

This means your DSS server is encountering a memory issue. Sometimes, this means that increasing your DSS backend.xmx memory setting is necessary.

It’s also possible that a user performed a particular action that caused a memory issue on the server. For example, let’s say a user attempted to view an entire 30GB dataset in the Explore tab of DSS by increasing the sample size setting to view the entire dataset.

This can cause a performance issue on the system.

You will want to educate your users on the reason why sample settings are set to a default of 10,000 to prevent any negative performance implications.