ENBIS-15 in Prague

6 – 10 September 2015; Prague, Czech Republic Abstract submission: 1 February – 3 July 2015

My abstracts


The following abstracts have been accepted for this event:

  • Skeletons, Flying Carpets, and a Saw

    Authors: Christian Ritter (Université Catholique de Louvain)
    Primary area of focus / application: Modelling
    Secondary area of focus / application: Design and analysis of experiments
    Keywords: Regression, Experimental design, Model averaging, Statistical graphics
    Submitted at 2-Apr-2015 08:07 by Christian Ritter
    8-Sep-2015 12:40 Skeletons, Flying Carpets, and a Saw
    How can we visually explore data and models obtained from designed
    experiments (or theoretical considerations) when there are multiple
    inputs, multiple responses, and when the relationships are complex?
    Multiplying or combining graphs of single responses with respect to one
    or several inputs quickly becomes confusing or oversimplistic.
    Animations of such response profile displays and interaction graphs by
    sliders can help, but only to a limited extend. The main problem is,
    that they don't create visual 'stories' which relate to the subject area
    and thereby fail to inspire the scientist. Here we shall describe two
    types of dual response displays which can under some conditions overcome
    this problem. We then use them to explore data from a paper helicopter
    experiment reported by Box and Liu (1999). Here we saw the design into
    slices and bones and display the results for a range of weighted
    averages between a base and a full model until we understand what's
    going on.
  • What is a MOOC?

    Authors: Nathalie Villa-Vialaneix (INRA, UR 875 MIAT)
    Primary area of focus / application: Education & Thinking
    Keywords: MOOC, Online course, Teaching, Statistics, Lifelong learning
    Submitted at 6-Apr-2015 17:34 by Nathalie Villa-Vialaneix
    Accepted (view paper)
    8-Sep-2015 10:10 What is a MOOC?
    The word MOOC is used for on-line teaching platforms as well as for the courses available on these platforms. Since the first MOOC, released in 2012, a huge number of new courses have been proposed on these platforms and MOOCs have gained a large amount of attention from governments and universities all around the world. The purpose of the proposed communication is to present some of these platforms and courses and to describe which innovative and interesting practices are used in them. Several examples of the way the lessons are organized (between videos, quizzes and courses) will be discussed, making an effort to focus mainly on statistics courses. I will also explain and describe how the progresses of all learners are synchronized, using deadlines for due work, reminder emails, forums and wikis for cooperative work. The different options provided by several courses and platforms will be described and their respective advantages/drawbacks discussed in order to provide an analysis of what can help workers wanting to get involved in a lifelong learning process.
  • Optimization of Stochastic Simulators by Gaussian Process Metamodelling – Application to Maintenance Investments Planning Problems

    Authors: Bertrand Iooss (EDF R&D), Thomas Browne (EDF R&D), Loïc Le Gratiet (EDF R&D), Jérome Lonchampt (EDF R&D)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Modelling
    Keywords: Asset management, Stochastic simulator, Computer experiments, Gaussian process, Uncertainty, Adaptive design, Optimization
    Submitted at 7-Apr-2015 14:24 by Bertrand Iooss
    Accepted (view paper)
    8-Sep-2015 11:40 Optimization of Stochastic Simulators by Gaussian Process Metamodelling – Application to Maintenance Investments Planning Problems
    EDF looks for assessing and optimizing its strategic investments decisions for its electricity production assets by using probabilistic and optimization methods of "cost of maintenance strategies". In order to quantify the technical and economic impact of a candidate maintenance strategy, some economic indicators are evaluated by Monte Carlo simulations using the VME software developed by EDF R&D (which is called hereafter the "stochastic simulator"). The major output result of the Monte Carlo simulation process from VME is the probability distribution of the Net Present Value (NPV) associated to the maintenance strategy. From this distribution, some indicators, such as the NPV mean, the NPV standard deviation or the regret investment probability (Prob(NPV < 0)) can easily be derived, ... Once these indicators have been obtained, one is interested in optimizing the strategy, for instance by determining the optimal investments dates leading to the highest mean NPV and the lowest regret investment probability. Due to the discrete nature of the events to be optimized, the optimization method is based on genetic algorithms.

    During the optimization process, one of the main issues is the computational cost of the stochastic simulator to optimize, which leads to methods requiring minimal simulator runs. The solution investigated in this study is to develop and use a metamodel instead of the simulator within the mathematical optimization algorithm. From a first set of simulator runs (called the learning sample and coming from a specific design of experiments), a metamodel consists in approximating the simulator outputs by a mathematical model. This metamodel can then predict the simulator outputs for other input configurations. Many metamodeling techniques are available in the computer experiments literature. However, these conventional methods are not suitable in the present framework because of the stochastic nature of the simulator: the output of interest is not a single scalar variable but a full probability density function (or a cumulative distribution function, or a quantile function).

    We first propose to build a metamodel of the stochastic simulator using the following key points:
    1) Emulation of the quantile function which proves better efficiency than the emulation of the probability density function;
    2) Decomposition of the quantile function in a sum of the quantile functions coming from the learning sample outputs;
    3) Selection of the most representative quantile functions of this decomposition using an adaptive choice algorithm (called the modified magic points algorithm) in order to have a small number of terms in the decomposition;
    4) Emulation of each coefficient of this decomposition by a Gaussian process metamodel.

    The metamodel is then used to treat a simple optimization strategy maintenance problem using the VME code, in order to optimize a NPV quantile. Using the Gaussian process metamodel framework, an adaptive design method can be defined by extending in our case the well known EGO (Efficient Global Optimization) algorithm. This allows to obtain an “optimal” solution using a small number of VME simulator runs.
  • Data Mining in Direct Marketing - Attribute Construction and Decision Tree Induction

    Authors: Petra Perner (Institut for Computer Vision and Applied Computer Sciences), Andrea Ahlemeyer-Stubbe (draftcom)
    Primary area of focus / application: Mining
    Secondary area of focus / application: Business
    Keywords: Direct marketing, Data analysis, Decision tree induction, Mailing action
    Submitted at 10-Apr-2015 10:15 by Petra Perner
    7-Sep-2015 10:20 Data Mining in Direct Marketing - Attribute Construction and Decision Tree Induction
    There are a lot of companies world-wide that collect addresses and life-style information about consumers, businesses and market places as they come in. These data bases are often not set up for a special purpose and contain a lot of information. This makes data mining more difficult than when the data base just includes data that were collected for building a specific classification model. In this paper we have shown how decision tree induction can be used to learn the profile of an "Internet User" or "No Internet User". Attribute construction based on domain knowledge was necessary as the available data base was not a special-purpose one. Decision tree induction using C4.5 and decision tree induction based on the minimum description length (MDL) principle were used to train the classification model. The decision tree induction methods based on the MDL principle have shown the best results in terms of error rate and explanatory capability. There can be enormous financial savings when mailing customers if highly potential customers
    are identified using data mining. Therefore, it is always better to employ data mining methods rather than just mailing all known addresses.
  • An Approach for Monitoring Count Data Using Principal Components Regression-Based Control Charts

    Authors: Danilo Marcondes Filho (Federal University of Rio Grande do Sul), Ângelo Márcio Oliveira Sant’Anna (Federal University of Bahia)
    Primary area of focus / application: Process
    Secondary area of focus / application: Modelling
    Keywords: Poisson model, Model-based control chart, PCA, PCA regression, PCA-based control chart.
    Submitted at 14-Apr-2015 22:48 by Danilo Marcondes Filho
    Control charts based on regression models are appropriate to monitoring processes in which the quality characteristics of products vary depending on adjustments in process variables (or input variables). Its use enables monitoring the correlation structure between input variables and the response variable through residuals from the fitted model according to historical process data. However, such strategy has two limitations: (i) is restricted to input variables which are not significantly correlated (that is, input data without multicolinearity); (ii) does not allow diagnosis of disturbance in the process reflected by changes in the control variables that does not directly affect the response variable, that is, does not affect the residuals generated by the control chart. This paper proposes a strategy for monitoring count data in plastic plywood process, combining the Poisson regression and principal component analysis. In such strategy, collinear variables are turned into uncorrelated variables by principal component analysis and a Poisson regression is performed on principal components scores. A set of control charts including the residuals from the proposed model and PCA-based charts is then used to evaluate the laminated plastic plywood manufacturing process.
  • Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments

    Authors: Prof David Steinberg (Tel Aviv University), Yoni Schamroth (Perion Networks), Boris Rabinovich (Perion Networks), Liron Gat Kahalon (Perion Network)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Modelling
    Keywords: GLM, Re-sampling statistics, Lifetime value, Controlled experiments
    Submitted at 15-Apr-2015 11:31 by Yoni Schamroth
    Controlled Experimentation has been universally adopted by the online world as an essential tool in aiding in the decision making process and has been widely recognized as a successful scientific method for establishing causality. Frequently referred to A/B testing or multivariate testing, controlled experiments provide a relatively straightforward method for quickly discovering the expected impacts of new features or strategies. One of the main challenges involved in setting up an experiment is deciding upon the OEC, or overall evaluation criteria. In this paper, we demonstrate the importance of choosing a metric that focuses on long term effects. Such metrics include measures such as life-span or lifetime value. We present an innovative methodology for early detection of lifetime differences between test groups . Finally we present motivating examples where failure to focus on the long term effect may result in an incorrect conclusion..