Have you ever wondered what this complex tool does? It can help to identify non-identical duplicates in a column. What are non-duplicates? In the example below, I use comic super heroes to help explain this tool. Please note this tool is complex and the below examples are just a couple of ways to use this tool.
In the below data set, I have a list of superheroes.
data:image/s3,"s3://crabby-images/bc380/bc380c48989858f1230e22ea6b96bc43b71507aa" alt=""
I can use the Fuzzy Match Tool to tell me I have duplicates for the number 002 in the Issue column by configuring the tool as shown below;
Merge/Purge Mode - Purge Mode
Record ID Field - Issue
Field Name - Issue
Match Style - Address (this is the like data type for the field. Preset types are listed in this link here).
Advanced Options - check the boxes for the items below;
Output Match Score
Output Unmatched Records
data:image/s3,"s3://crabby-images/c11ca/c11caf06d5123c199639f8f24efe59be6d08a1ca" alt=""
The records will appear as below;
data:image/s3,"s3://crabby-images/17841/178417312771644189593af33d61f2d73f65d0e0" alt=""
These results show me that I have an exact match (100%) for the number 001 in the Issue column.
What if I want to pick up on the duplicate Spiderman field? Notice the first instance of Spiderman has a space. So how do you identify this as a duplicate?
When I attach the Fuzzy Match Tool, I select the following;
Merge/Purge Mode - Purge Mode
Record ID Field - SuperHeroes
Field Name - SuperHeroes
Match Style - Custom
data:image/s3,"s3://crabby-images/1da5b/1da5b764d981a14e399712759204152825677ff9" alt=""
I then click on the Edit button next to Custom.
Preprocess - Strip Punctuation
Generate Keys - Double Metaphone (This is the preferred algorithm for matching. It looks at words phonetically by reducing them to sounds. This can also fix match problems with incorrect spellings.)
Match Function - None - Key Match Only
data:image/s3,"s3://crabby-images/f8e23/f8e2304ccb6258fda56c0296d09d321111e2e814" alt=""
I receive the following output.
data:image/s3,"s3://crabby-images/beb70/beb70a796d492b4253865eb5874d6e1312dc40cf" alt=""
If I were to go back and change the Spider Man to Speder Men, it would still show up in my results because of the Double Metaphone algorithm.
data:image/s3,"s3://crabby-images/8a0ff/8a0ff216ca6229362cc74a81ed3a6fca1f2d89e6" alt=""
For more information about the Fuzzy Match Tool algorithms, refer to this link in the help.alteryx.com site.
Comments