Troubleshoot | A visual recipe job log says “Computation will not be distributed”#
In the case of a visual recipe, two general causes for slow performance are using an inefficient execution engine or using data formats that don’t allow for the most optimal execution engine.
Suboptimal dataset formats#
Another way to tell if a visual recipe is not optimized is by looking in the job log for any reference to “Computation will not be distributed.” That’s an indicator that there is something suboptimal in your input/output dataset format, the engine you’ve selected, or the permissions on the input/output dataset connection.
For example, using the fast-path when writing to an S3 CSV dataset requires that the output dataset does not have a header row configured. If you attempt to write to an output S3 CSV dataset that does, you’ll notice an entry in the job log that indicates that this is the case, and that this can lead to a performance issue:
[2022/01/21-17:47:35.980] [null-err-43] [INFO] [dku.utils] - [2022/01/21-17:47:35.978]
Cannot use Csv write fast-path for Csv-S3 dataset: Csv fast-path output is disabled in configuration
[2022/01/21-17:47:35.982] [null-err-43] [INFO] [dku.utils] - [2022/01/21-17:47:35.978]
Writing S3 dataset as remote dataframe.
Computation will not be distributed
In each of the above cases, it’s usually best to modify your Flow in a way that will allow you to use the fast-path and preferred engine.