New Zealand Statistical Association 2024 Conference

Manori Wickramasinghe

Statistics New Zealand

Improving probabilistic linking in the IDI

This is joint work with Marina Chen

In probabilistic record linking, accurate name matching is critical particularly in datasets with culturally diverse populations. The IDI used the SOUNDEX phonetic algorithm as a blocking variable, which often struggle to match non-European names, which leads to lower link rates for ethnic groups such as Māori, Pacific, and Chinese. Also, the occurrences of the short names tend to lower the link rate, challenging the accuracy of the record linkage processes. We explore how replacing SOUNDEX with NYSIIS (New York State Identification and Intelligence System) algorithm along with other methods can enhance linkage accuracy for diverse populations. Attendees will gain insights into improving link rates for non-European names and boosting overall record linkage quality.

