Concept | Fuzzy join recipe#
When joining datasets, you usually need key column values to match exactly.
However, data isn’t always clean or uniform. Capitalization, spacing, and spelling might vary for matching values.
For instance, you might want to match Grey
with gray
.
The Fuzzy join recipe makes it possible to join datasets even when strings, numbers, or geopoints don’t exactly align.
Fuzzy logic#
There are a few ways to control how the Fuzzy join recipe approximates key matches. The main configurations are listed below.
Configuration |
Description |
---|---|
Distance |
The difference between values. Find information on available distances in the reference documentation. |
Threshold |
The level of difference that you will allow. |
Normalization |
Types of text processing that standardize key column values during matching. |
What’s next?#
Ultimately, you can use a fuzzy join to combine datasets without prior data preparation or coding. Apply this knowledge in Tutorial | Fuzzy join recipe!
See also
Learn about additional details and settings in Fuzzy join: joining two datasets.