Concept | Fuzzy join recipe#

When joining datasets, you usually need key column values to match exactly. However, data isn’t always clean or uniform. Capitalization, spacing, and spelling might vary for matching values. For instance, you might want to match Grey with gray.

The Fuzzy join recipe makes it possible to join datasets even when strings, numbers, or geopoints don’t exactly align.

Fuzzy logic#

There are a few ways to control how the Fuzzy join recipe approximates key matches. The main configurations are listed below.

Configuration

Description

Distance

The difference between values. Find information on available distances in the reference documentation.

Threshold

The level of difference that you will allow.

Normalization

Types of text processing that standardize key column values during matching.

What’s next?#

Ultimately, you can use a fuzzy join to combine datasets without prior data preparation or coding. Apply this knowledge in Tutorial | Fuzzy join recipe!

See also

Learn about additional details and settings in Fuzzy join: joining two datasets.