Research Seminar

Sparse simultaneous component analysis


Katrijn Van Deun


KU Leuven

Abstract: High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods, which are an extension of principal component analysis to such coupled data, are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules of every data source (transcripts and proteins) have to be taken into account.

We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero, implying variable selection. The method is flexible both with respect to the component model and with respect to the sparse structure imposed: Variable selection can be imposed either on the component weights or on the loadings, and can be imposed either within data blocks, across data blocks, or both within and across data blocks. A penalty based approach is used that includes the lasso, the ridge penalty, the group lasso, and elitist lasso; suitable (combinations of) penalties yield the desired sparse structure. The method includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Estimation of the model relies on an alternating least squares and majorization minimization procedure. Using simulated data, we will evaluate the performance of the method for different sparsity structures (e.g., between blocks, within blocks) using different (combinations of) penalties, and we will compare the sparse component weight and loading based models. The relevance of the method for psychology will be made clear by an illustrative application on emotion data.
Date: Tue Nov 29, 12:00 pm - 1:00 pm
Place: room 01.07 (Department of Psychology, Tiensestraat 102, 3000 Leuven)