Removing Duplicates¶
Excel can remove duplicate values, using all columns or a subset to determine uniqueness of a row. Duplicates are simply removed, with no way to recover them later.
Dataiku’s Distinct recipe identifies and removes duplicate rows within a dataset. Additionally, it can track which rows had duplicates, and how many, in the original dataset.
See also
Distinct recipe lessons in the Visual Recipes Overview course
Distinct recipe in the reference documentation