SMABS 2004 Jena University
SMABS 2004 Home Organization About Jena Sponsors Links Imprint / Contact SMABS Home

European Association of Methodology

Department of methodology and evaluation research

Jena University

Contributions: Abstract

Interpretation and variable selection in redundancy analysis

M. Rosário Oliveira João A. Branco
Technical University of Lisbon
Portugal

The study of the relationship between two sets of variables is of great interest in many fields. The traditional method used to deal with this problem is canonical correlation analysis (CCA). However, the interpretation of the results of CCA can pose several difficulties to the analyst. To overcome these difficulties, Stewart and Love (1968) proposed an index, called redundancy index, that measures the mean variance of the variables of one set that is explained by the canonical variates of the other set, and van den Wollenberg (1977) proposed a method called redundancy analysis (RA) that searches for linear combinations that maximize this index.

RA is particularly suitable when the two sets of variables play different roles (one being identified as the predictor and the other as the criterion set). The objective is to estimate linear combinations of the predictors that maximize the variance explained among the variables of the criteria set. This is different from CCA where the two sets have a symmetric role and the objective is to measure the multivariate association between the two sets. Applications of RA have appeared in a variety of fields such as behavioural sciences (DeSarbo, 1981), genetics (van Eeuwijk, 1992), medicine (Friedman and Thayer, 1991), ecology (Legendre and Anderson, 1999) and other areas.

One difficulty is that the results of the RA can be badly distorted by the presence of atypical observations, which appear frequently in multivariate data sets but are difficult to detect. Robust methods are suitable to deal with this situation.

In this paper robust redundancy analysis, as developed in Oliveira and Branco (2002) and Oliveira et al. (2003), is used to approach two different problems: (i) interpretation of redundancy variates and (ii) variable selection. The methods are illustrated with real data.

References

DeSarbo, W.S. (1981). Canonical/redundancy factoring analysis. Psychometrika, 46, 307-329.

Friedman, B.H. and Thayer, J.F. (1991). Facial muscle activity and EEG recordings: redundancy analysis. Electroencephalography and Clinical Neurophysiology, 79, 358-360.

Legendre, P. and Anderson, M.J. (1999). Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs, 69, 1-24.

Oliveira, M.R. and Branco, A.J. (2002). Comparison of three methods for robust redundancy analysis. In: R. Dutter, P. Filzmoser, U. Gather and P. J. Rousseeuw, editors, Developments in robust statistics, pp. 287-295, Springer-Verlag: Heidelberg.

Oliveira, M.R., Branco, J.A., Croux, C. and Filzmoser, P.] (2003). Robust redundancy analysis by alternating regression. In: M. Hubert, G. Pison, A. Struyf and S. van Aelst, editors, Series: Statistics for Industry and Technology, Accepted for publication, Birkhauser: Basel.

D.K. Stewart and W.A. Love (1968). A general canonical correlation index. Psychological Bulletin, 70, 160-163.

van den Wollenberg, A.L. (1977). Redundancy analysis: an alternative for canonical correlation analysis. Psychometrika, 42, 207-219.

van Eeuwijk, F.A. (1992). Interpreting genotype-by-environment interaction using redundancy analysis. Theoretical and Applied Genetics, 85, 89-100.