SMABS 2004 Jena University
Contributions: Abstract

Estimation of the correlation coefficient based on selected data

Gösta Hgglund Rolf Larsson
Stockholm University Uppsala University

In psychometrics, one often encounters data that may not be considered random, but selected in a systematic way according to some explanatory variable. For a real-world example, see Grimm (1993). There the correlation between the variables graduate record examination-scores and results obtained by students at the end of the first year of graduate school in USA is discussed. Since students are selected due to restriction this correlation becomes attenuated.

In our paper, we consider maximum likelihood estimation when data is presumed to arise from a bivariate normal distribution which is truncated in an extreme way. Two methods are presented and compared to already existing methods, see Sackett and Yang (2000). One of our methods is purely numerical, while the other one is based upon an approximation. Both methods are tried on simulated as well as on real data. One of our data sets consists of summed marks for 14561 Swedish college students together with scores of the Swedish Scholastic Aptitude Test in 1993 for the same students, see Gustafsson, Wedman and Westerlund (1992). The purely numerical method is shown to be the most reliable over all, but in some cases, the computationally less burdensome approximate method turns out to work almost as well.


Grimm, L.G. (1993) Statistical Applications for the Behavioral Sciences. New York: Wiley.

Gustafsson, J.-E., Wedman, I & Westerlund, A, (1992) The Dimensionality of the Swedish Scholastic Aptitude Test. Scandinavian Journal of Educational Research, 36, 21-39.

Sackett, P.R. & Yang, H. (2000) Correction for Range Restriction: An Expanded Typology. Journal of Applied Psychology, 85, 112-118.