ENBIS: European Network for Business and Industrial Statistics
Forgotten your password?
Not yet a member? Please register
ENBIS14 in Linz
21 – 25 September 2014; Johannes Kepler University, Linz, Austria Abstract submission: 23 January – 22 June 2014The following abstracts have been accepted for this event:

Probabilistic Clustering of Panel Time Series Using a TimeInhomogeneous Model Built Around Markov Chains
Authors: Stefan Pittner (Vienna University of Economics and Business), Sylvia FrühwirthSchnatter (Vienna University of Economics and Business)
Primary area of focus / application: Mining
Secondary area of focus / application: Modelling
Keywords: Modelbased clustering, Discrete time series, Bayesian inference, Markov chain Monte Carlo, Gibbs sampler, Markov mixture models, Applications in economics
Submitted at 12Jun2014 21:34 by Stefan Pittner
Accepted
In this work, each of a given number of H different clusters is represented by a seperate Markov chain. The model M underlying our clustering procedure uses the assumption that a certain time series belongs to one of the H different clusters if and only if it has been generated by the Markov chain of the respective cluster.
Two different time series which are generated by a single Markov chain generally have the following property: They are relatively far apart in one of the standard metrics (e.g., the Euclidean distance). This means that our modelbased clustering approach is in contrast to the more common concept of distancebased clustering (such as kmeans
clustering): Our approach aims at the overall statechanging behavior of a time series
instead of its functional form.
The presentation starts with model M=M_1 which does a simple random cluster assignment for time series generation through a discrete distribution eta_1, eta_2, ..., eta_H over the H clusters.
The major model M=M_2 a) allows each time series' cluster to depend on a vector of static covariates. Such a covariate vector has to be associated with each time series. The model is achieved by expressing each of the probabilities eta_h in model M_1 by a multinomial logit. That way, any covariate can be either discrete or continuous. b) Additionally, we apply a refinement over model M_1: Instead of clusters expressed by a single Markov chain for the whole time period, we provide a separate Markov chain for each subperiod in an equidistant partition inside each cluster. This feature helps the model to adapt to behavior changes over the course of a time series due to exogenous circumstances.
Two additional models M=M_1a and M=M_2a arise from model M_1 and M_2, respectively, by letting the Markov chain of each cluster depend on discrete covariates
with a finite number of levels. In model M_2a these covariates can be equal to, partially equal to, or different from the covariate vector of item (a) above. Moreover, in model M_2a these covariates can now be dynamic  they can vary with each subperiod of item (b).
The parameters of each cluster model are estimated through Bayesian inference using an MCMC (Markov chain Monte Carlo) sampling scheme. It always consists of a threestage Gibbs sampler which involves quick draws from standard distributions. There are no structural changes of the MCMC methods when moving from model M_1 to M_1a and from model M_2 to M_2a.
The utilization of these kind of models is briefly demonstrated for an application in econometrics (employment status after bankruptcy) and a marketing application (customer purchase history of textiles). 
Bayesian Modeling of Time Series of Counts
Authors: Refik Soyer (Department of Decision Sciences  The George Washington University)
Primary area of focus / application: Finance
Secondary area of focus / application: Modelling
Keywords: TimeSeries analysis, StateSpace, Environmental models, Poisson and negativebinomial timeseries, Bayesian inference, Markov chain Monte Carlo
Submitted at 16Jun2014 11:58 by Marco P. Seabra dos Reis
Accepted
Joint work with Tevfik Aktekin (University of New Hampshire) and Bumsoo Kim (Sogang University, Seoul, Korea) 
Measuring the Effectiveness of Process Improvement in a NonRandomized Experiment
Authors: Susana Vegas (Universidad de Piura), Valeria Quevedo (Universidad de Piura)
Primary area of focus / application: Quality
Secondary area of focus / application: Process
Keywords: Process improvement, Regression discontinuity, Nonrandomized experiment, Effectiveness measurement
Submitted at 16Jun2014 18:02 by Susana Vegas
Accepted
In some cases, operation conditions prevent the performance of a randomized experiment. To overcome this situation, this paper will address the use of the Regression Discontinuity (RD) approach as an alternative procedure to estimate the effect of an experiment. RD can be used when the assignment of treatment is made using a known “threshold” of an explanatory variable.
An application to verify the effectiveness of a remedial program in undergraduate students is shown. It was found that the remedial program has a marginal effect of 32.8 percent points on the students’ expected value for passing the first semester. The findings suggest that RD can effectively be used to control the performance under the conditions stated above. 
Mixture Models for Text Mining in R
Authors: Bettina Grün (Johannes Kepler Universität Linz)
Primary area of focus / application: Mining
Keywords: Bagofwords model, Finite mixture model, Text mining, Topic model, R
Among these bagofwords models two different mixture models have been proposed: finite mixtures of von MisesFisher distributions and the latent Dirichlet allocation topic model. Finite mixtures of von MisesFisher distributions are fitted based on the assumptions that each document belongs to only one cluster and that only the directional information in the data is of importance. The latent Dirichlet allocation topic model is a generative model for the term frequencies in a document which aims at capturing the observed dependencies between them. Each document is assumed to be a mixture of several topics and each topic is characterized by its own term distribution.
We give an introduction into these models and their estimation. Furthermore, we present the R packages movMF and topicmodels which allow to fit these models. Both packages build on and extend functionality from the text mining package tm. The functionality provided by the packages is outlined as well as their application illustrated. 
Multivariate Data Analysis in the Big Data Era
Authors: Alberto Ferrer (Universidad Politecnica de Valencia)
Primary area of focus / application: Modelling
Secondary area of focus / application: Process
Keywords: Big data, Multivariate data analysis, Latent structures, Data analytics
Submitted at 18Jun2014 13:33 by Alberto J. FerrerRiquelme
Accepted

Using Informative Missing to Build Models That Make Better Predictions
Authors: Volker Kraft (JMP), Ian Cox (JMP)
Primary area of focus / application: Modelling
Keywords: Data quality, Informative missing, Prediction, Modelling, JMP Pro
Submitted at 18Jun2014 18:21 by Volker Kraft
Accepted