Have you ever wondered what this complex tool does? It can help to identify non-identical duplicates in a column. What are non-duplicates? In the example below, I use comic super heroes to help explain this tool. Please note this tool is complex and the below examples are just a couple of ways to use this tool.
In the below data set, I have a list of superheroes.
I can use the Fuzzy Match Tool to tell me I have duplicates for the number 002 in the Issue column by configuring the tool as shown below;
Merge/Purge Mode - Purge Mode
Record ID Field - Issue
Field Name - Issue
Match Style - Address (this is the like data type for the field. Preset types are listed in this link here).
Advanced Options - check the boxes for the items below;
Output Match Score
Output Unmatched Records
The records will appear as below;
These results show me that I have an exact match (100%) for the number 001 in the Issue column.
What if I want to pick up on the duplicate Spiderman field? Notice the first instance of Spiderman has a space. So how do you identify this as a duplicate?
When I attach the Fuzzy Match Tool, I select the following;
Merge/Purge Mode - Purge Mode
Record ID Field - SuperHeroes
Field Name - SuperHeroes
Match Style - Custom
I then click on the Edit button next to Custom.
Preprocess - Strip Punctuation
Generate Keys - Double Metaphone (This is the preferred algorithm for matching. It looks at words phonetically by reducing them to sounds. This can also fix match problems with incorrect spellings.)
Match Function - None - Key Match Only
I receive the following output.
If I were to go back and change the Spider Man to Speder Men, it would still show up in my results because of the Double Metaphone algorithm.
For more information about the Fuzzy Match Tool algorithms, refer to this link in the help.alteryx.com site.
Comments