Contributions: Abstract

Most of the data used in social epidemiology and the social sciences in general are collected in standardized interviews. However, greater reliance on information technology means that new data sets are increasingly being generated through the exchange of information among existing data bases ("record linkage"). The "record linkage" program works by merging data from different data sets through a common key (a name, a social security number or another permanent personal identification number, for example). As long as the keys used to perform the merge are error-free, the merge is unproblematic. However, problems arise if the keys contain errors, for instance if names are spelled incorrectly, if different versions of a person's name are contained in the data sets being matched or if data entry errors result in switched digits in the identification numbers. The purpose of the record-linkage research is the development and evaluation of computer programs that merges individual-level data from different data sets and tolerates error-prone keys. Algorithms for determining string similarity and choice of optimal thresholds are discussed. Some empirical results of the perfomance of different algorithms will be presented.