ENBIS-17 in Naples

9 – 14 September 2017; Naples (Italy) Abstract submission: 21 November 2016 – 10 May 2017

Outlier Detection for Multivariate Categorical Data

11 September 2017, 10:30 – 10:50


Submitted by
josep ginebra
Josep Ginebra (Universitat Politecnica de Catalunya), Xavier Puig (Universitat Politecnica de Catalunya)
Statistical analysis assumes that data can be approximately modeled through a given statistical model. That can fail either because the model is wrong for all the observations, or just because it fails for a small subset of outliers. The literature on outlier detection deals mainly with continuous type data. Outlier detection methods for multivariate categorical data, often presented in terms of contingency tables, mostly deal with the detection of outlying cells instead of outlying rows or columns.

For us, the rows of the contingency table are the units of interest, and one looks for rows which depart from the distributional behavior of the majority. The solution adopted extends the framework proposed by Box and Tiao in 1968 for normal models, by adapting it to multinomial models with random effects. That approach supposes that there exist
two alternative models, a basic model adequate for the majority of the observations, and a second model for outliers, and it uses Bayesian cluster analysis to classify all the observations into either the outlier or the non-outlier group.
The method starts by estimating the variability of the main set of observations through a subset of observations known to be uncontaminated with outliers, and then it identifies the outlier observations and their distribution. The method applies to any type of contingency table, but we focus on its use on the analysis of multivariate control charts.
View paper

Return to programme