Carleton University - School of Computer Science Honours Project
Fall 2020
Utilizing Traditional Blocking Approaches in Record Linkage and Measuring its Effectiveness
SCS Honours Project Image
ABSTRACT
Record Linkage is the important process of determining whether records from across different data sources relate to the same entity. This has large implications on many different agencies and industries in both the public and private sector. In this project, we focus our attention on the blocking aspect of record linkage. The data comprises of csv tables of randomly generated data of peoples’ information where the ground truth values in one table can be compared to manipulated data in a separate table, namely typographic errors and missing entries but also assuming the same schema. The experiment involved trying different blocking schemes and comparing subsets of attributes to find which subset produced the highest F-Measure. The precision, recall, and F-Measure values for the experiment were 73.75%, 95.16%, 83.1% respectively. The process was repeated several times on the 16 attributes and only 3 were needed to produce the highest F-Measure, thus greatly reducing the number of pairwise comparisons.