Research Seminar

Bayesian mixture modeling with variable selection


Tomoki Tokuda


KU Leuven

Abstract: A general problem in clustering high dimensional data is that inclusion of irrelevant variables could mask the ‘true’ group structure. For an effective clustering of observations, some form of variable selection is therefore essential.
In this presentation, we will discuss a Bayesian mixture method for high dimensional data, proposed by Tadesse, Sha and Vannuci (JASA, 2005). In this method, only a subset of the variables is used to perform the clustering of the observations into distinct groups with the remaining variables being ‘switched off’ and being treated as background variables for the clustering.
However, there are two drawbacks for this method. Firstly, the theoretical and simulation results imply that the method is not scale-invariant (i.e., transforming the unit of one variable may influence the results); secondly, the results of the method seem to be sensitive to the number of irrelevant variables. Thus, these problems may considerably hamper the use of the method in practice. The root cause of the problems is in the use of eigenvalues of the empirical covariance matrix as implied by the data for the specification of hyperparameters in some part of the prior distributions. An alternative method for overcoming these problems, which essentially relies on standardization of the data, will be discussed.

Date: Tue May 20, 12:15 pm - 1:15 pm
Place: room 00.60 (Department of Psychology, Tiensestraat 102, 3000 Leuven)