ENBIS-15 in Prague

6 – 10 September 2015; Prague, Czech Republic Abstract submission: 1 February – 3 July 2015

My abstracts


The following abstracts have been accepted for this event:

  • Data Visualisation Books and Blogs: An Overview

    Authors: Gaj Vidmar (University Rehabilitation Institute, Republic of Slovenia)
    Primary area of focus / application: Other: Graphics and Graphical Models for Illuminating Data
    Secondary area of focus / application: Education & Thinking
    Keywords: Data visualisation, Statistical graphics, Books, Blogs, Reviews, Communication, Education
    Submitted at 16-May-2015 11:20 by Gaj Vidmar
    8-Sep-2015 11:40 Data Visualisation Books and Blogs: An Overview
    In a few decades, and particularly during the last ten years, data visualisation has undergone rapid development from a topic of marginal academic interest through a technological hype to a mainstream information tool (or the latter transition has at least begun). In this talk, I will attempt to give a comprehensive overview of the books published so far (or presently in print) on data visualisation, as well as of leading data visualisation-related blogs of global interest and reach. The overview of those resources will be based on an ad-hoc taxonomy, tentatively including the following categories: Pioneers and Founders, Statistical Visualisation (subcategories: Categorical Data, Interactive/Dynamic, Geographic), Popular/General Audience, History, Cognitive Psychology, Software-Oriented (subcategories: R, Other), Networks, Information Visualization (subcategories: Computer Science, Data Journalism, Cofee-Table), and Scientific Visualization. Together with the overview, I will present some of my opinions, criticisms and recommendations regarding the developments and practices in data visualisation, including the issues of data visualisation in scientific publishing and the role of data visualisation in public education.
  • Simulating and Analyzing Experiments in the Tennessee Eastman Process Simulator

    Authors: Francesca Capaci (Luleå University of Technology), Bjarne Bergquist (Luleå University of Technology), Erik Vanhatalo (Luleå University of Technology), Murat Kulahci (Technical University of Denmark)
    Primary area of focus / application: Design and analysis of experiments
    Keywords: Experimental design, Tennessee Eastman process simulator, Autocorrelated data, Continuous process
    Submitted at 19-May-2015 15:53 by Francesca Capaci
    7-Sep-2015 15:35 Simulating and Analyzing Experiments in the Tennessee Eastman Process Simulator
    In many of today’s continuous processes, the data collection is usually performed automatically yielding exorbitant amount of data on various quality characteristics and inputs to the system. Moreover, such data are usually collected at high frequency introducing significant serial dependence in time. This violates the independent data assumption of many industrial statistics methods used in process improvement studies. These studies often involve controlled experiments to unearth the causal relationships to be used for robustness and optimization purposes.

    However real production processes are not suitable for studying new experimental methodologies, partly because unknown disturbances/experimental settings may lead to erroneous conclusions. Moreover large scale experimentation in production processes is frowned upon due to consequent disturbances and production delays. Hence realistic simulation of such processes offers an excellent opportunity for experimentation and methodological development.

    One commonly used process simulator is the Tennessee Eastman (TE) challenge chemical process simulator. The process produces two products from four reactants, containing 41 measured variables and 12 manipulated variables. In addition to the process description, the problem statement defines process constraints, 20 types of process disturbances, and six operating modes corresponding to different production rates and mass ratios in the product stream.

    The purpose of this paper is to illustrate the use of the TE process with an appropriate feedback control as a test-bed for the methodological developments of new experimental design and analysis techniques.
    The paper illustrates how two-level experimental designs can be used to identify how the input factors affect the outputs in a chemical process.

    Simulations using Matlab/Simulink software are used to study the impact of e.g. process disturbances, closed loop control and autocorrelated data on different experimental arrangements.
    The experiments are analysed using a time series analysis approach to identify input-output relationships in a process operating in closed-loop with multivariate responses. The dynamics of the process are explored and the necessary run lengths for stable effect estimates are discussed.
  • Projection Properties of Mixed Level Designs in Fewer Runs

    Authors: John Sølve Tyssedal (Department of Mathematical Sciences, The Norwegian University of Science and Technology), Muhammad Azam Chaudhry (Department of Mathematical Sciences, The Norwegian University of Science and Technology)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Design and analysis of experiments
    Keywords: Projectivity, Mixed level designs, Two-level factor, Three-level factor, Projection properties, Minimum run design
    Submitted at 19-May-2015 22:43 by Muhammad Azam Chaudhry
    Accepted (view paper)
    7-Sep-2015 15:55 Projection Properties of Mixed Level Designs in Fewer Runs
    The full factorial mixed level designs could be very large in terms of run number, depending on the number of factors and the factor levels. For example, a mixed level design that considers three factors e.g. a 2-level factor, a 3-level factor and a 5- level factor, if we use full factorial design, the full factorial contains a total of 30 runs. So it may be desirable to use non-regular mixed level design. The advantage of using this is its run size economy and it has some partial aliasing and this aliasing help them to entertain some interactions as well.
    In this paper we checked the projection properties of mixed designs with 2-level and 3-level to see the output in 12 runs. Our recommended design has the property to estimate all main effects, all two factor interactions and quadratic effects when projected into three factors.
  • A Picture is Worth a Thousand Words. A Video is Worth a Thousand Pictures. Applications in Statistics.

    Authors: Bernard Francq (Université Catholique de Louvain, University of Glasgow)
    Primary area of focus / application: Education & Thinking
    Keywords: Teaching statistics, Learning statistics, Animated graph, Videos
    Submitted at 20-May-2015 12:22 by Bernard Francq
    9-Sep-2015 09:00 A Picture is Worth a Thousand Words. A Video is Worth a Thousand Pictures. Applications in Statistics.
    The new technologies and computer-based tools for learning and teaching have considerably increased in recent years. From textbooks to the use of multimedia, a wide variety of tools can now enhance the learning process. This is particularly important in mathematics and statistics, which are often considered difficult and abstract.

    This presentation will illustrate, with a variety of videos and animated graphs, several tools that may prove useful in the challenge of teaching at different levels and for a wide diversity of students (undergraduate, (post)graduate, CPD, clients in consultancy or multi-national conference delegates – to name but a few). Videos and animations are attractive tools to heighten participant engagement with the subject, and can lead to a better understanding of the material and dissemination of knowledge.

    During this presentation, the incorporation of videos and animations in teaching will be discussed with several examples:
    - Screencast, which may be useful to demonstrate the use of specialised software
    - Podcast, to illustrate practical solving of problems
    - Animation, to explain a statistical concept or to provide the ‘visual proof’ of a theorem

    The following concepts will be discussed:
    - Which tools are best suited to which audience.
    - How to incorporate these tools into your teaching.
    - Is there a risk that these new methods may replace the instructor?

    To conclude, it will be shown that ‘if a picture is worth a thousand words, then a video is worth a million’. In tough and abstract fields especially, the use of multimedia can facilitate the visualization of problems from new and entertaining angles.
  • The Bayes Factor for Computer Model Validation

    Authors: Guillaume Damblin (EDF R&D), Merlin Keller (EDF R&D), Pierre Barbillon (AgroParisTech), Alberto Pasanisi (EIFER), Eric Parent (AgroParisTech)
    Primary area of focus / application: Modelling
    Secondary area of focus / application: Quality
    Keywords: Code validation, Hypothesis testing, Model selection, Intrinsic Bayes factor, Fractional Bayes factor
    Submitted at 21-May-2015 15:22 by Merlin Keller
    Accepted (view paper)
    8-Sep-2015 10:50 The Bayes Factor for Computer Model Validation
    We introduce a new approach for the validation of a computer model (or “code”) simulating the behavior of a physical system of interest, based on the results of both physical and computer experiments. The key idea is to formulate code validation as a statistical test, confronting the point null hypothesis that the code predicts the behavior of the physical system “perfectly” (up to the inevitable measurement errors), with the composite alternative that code predictions are tainted with a systematic bias.

    When the code depends on uncertain parameters, the null hypothesis becomes the composite one that there exists a, so-called “true”, value of the vector of parameters, which yields a perfect adjustment of code predictions to the physical system. The alternative hypothesis is then that each distinct value of the parameter vector defines a nonzero model error function between code predictions and the actual physical system.

    Such a test can be performed in a frequentist fashion, by deriving the null distribution of a certain test statistic, such as the sum of squares of the differences between code predictions and physical measurements. Though popular, this approach suffers from several known limitations. In particular, because statistical tests typically control the type I (false rejection) error rate, it is never possible to accept the null hypothesis (i.e., validate the computer model) with a given level of confidence.

    To overcome such limitations, we propose to recast this statistical test in the more general framework of model selection, performed in a Bayesian fashion, i.e. by computing the Bayes factor between the null and alternative hypotheses. The Bayes factor can be interpreted as their posterior odds, given equal prior probabilities. Hence, it can be used to effectively “validate” the computer model when it takes large enough values, according for instance to Jeffreys’ scale of evidence.

    Under the assumption that the code outputs depend linearly on the uncertain parameters, we show how to compute the Bayes factor. However, the choice of a prior distribution under each hypothesis remains a difficult challenge, as it influences substantially the ensuing value of the Bayes factor. This is especially true for small datasets and vague prior information, a typically setting in the field of computer experiments.

    We compare several solutions to this problem, such as the intrinsic and the fractional Bayes factor. Both approaches use a fraction of the dataset to update an initially vague or improper prior into a proper informative posterior distribution. The latter is then used as a prior to compute the Bayes factor, based on the remaining data. These methods are tested on synthetic datasets, and then applied to an industrial test case, dealing with a computer model simulating the electricity produced by a power plant.
  • Application of Random Forests to Create Task-Based Control for a Parallel Hybrid Forklift – A Case Study

    Authors: Koen Rutten (Flanders Make), Beau Piccart (Flanders Make), Catalin Stefan Teodorescu (Flanders Make), Bruno Depraetere (Flanders Make)
    Primary area of focus / application: Mining
    Secondary area of focus / application: Process
    Keywords: Classification, Control, Vehicle, Case-study
    Submitted at 26-May-2015 17:38 by Koen Rutten
    7-Sep-2015 15:55 Application of Random Forests to Create Task-Based Control for a Parallel Hybrid Forklift – A Case Study
    Controlling the power-split between electrical and internal combustion engines is of the utmost importance in hybrid automotive systems to reduce fuel consumption . The Equivalent Consumption Minimization Strategy (ECMS) is one type of control that can be applied to this case. However, depending on the type of task the system is performing, the optimal controller parameters can change. In this case study, a Random Forest classifier is presented as a means to predict the type of task being performed on a parallel hybrid forklift using sensor information that is available on the vehicle. Using the cluster prediction on test data shows that a new task can be accurately identified within 15 seconds. This approach is used to calculate cluster-specific controllers that become active once the task is identified. It is shown, in simulation, that the fuel-efficiency of the vehicle increases with 34.7% by using these task-based controllers compared to a global robust controller.