Research Seminar

Bayesian mixture modeling with variable selection


Tomoki Tokuda


KU Leuven

Abstract: A general problem in clustering highdimensional data is that inclusion of irrelevant variables can mask the 'true' group structure. For an effective clustering of observations, some form of variable selection is therefore essential.

In this presentation, I will discuss a Bayesian multivariate normal mixture method with variable selection for highdimensional data, proposed by Tadesse, Sha and Vannuci (JASA, 2005). It is found that there are three drawbacks for this method. Firstly, the method is not scale-invariant (i.e., transforming the unit of one variable may influence the results); secondly, the results of the method are sensitive to the number of irrelevant variables; thirdly, the method may get trapped in a one-cluster solution. These drawbacks may considerably hamper the use of the method in practice. As a way out, several modifications of the method will be proposed.

Furthermore, Steinley & Brusco's (Psychometrika, 2008) method will be touched on. The method is based on the k-means algorithm, combined with a so-called clusterability index for screening possible discriminating variables. In an earlier comparison, it outperformed various alternative clustering methods with variable selection.

Finally, I will outline a new simulation study. In this study, the performance of three methods will be compared: the original Tadesse method, our modified Tadesse method, and the Steinley & Brusco method.
Date: Tue Feb 10, 12:15 pm - 1:15 pm
Place: room 02.51 (Department of Psychology, Tiensestraat 102, 3000 Leuven)