ENBIS-17 in Naples

9 – 14 September 2017; Naples (Italy) Abstract submission: 21 November 2016 – 10 May 2017

My abstracts

 

The following abstracts have been accepted for this event:

  • Clustering Functional Data Stream: Theory and Application for the Analysis of Web Data

    Authors: Tonio Di Battista (Università G. d'Annunzio), Fabrizio Maturo (Università G. d'Annunzio), Francesca Fortuna (Università G. d'Annunzio)
    Primary area of focus / application: Other: Italian SIS session on Statistics in Data Science
    Keywords: Clustering functional data, Data streaming, Data classification, FDA
    Submitted at 1-Mar-2017 14:38 by Tonio Di Battista
    Accepted (view paper)
    12-Sep-2017 12:10 Clustering Functional Data Stream: Theory and Application for the Analysis of Web Data
    In the last decades, the analysis of web data has become crucial for many companies. Most enterprises have a website, and to analyze consumer behaviour, when they surf the web, is crucial for making or changing corporate strategies. Information about the number of visits, bounce rate, time on site, traffic sources, and success of an online advertising are essential for understanding if and where firms are losing customers, forecast future sales, understand past performance, analyze profile customer behaviour, monitor buying patterns, measure the impact of site changes, and remove barriers to sale.
    Therefore, it is widely recognized that discovering how to analyze strategic aspects of a website, finding a marketing strategy, and understanding standard visitors are very relevant topics both in business administration and statistics.
    Analyzing this type of data often means to deal with big data and data streaming. The main issues in analyzing web traffic and web data are that they often flow continuously from a source and are potentially unbounded in size, and these circumstances inhibit to store the whole dataset. Moreover, the results of the inspection of this kind of data is useful when the analytics are in "real time" because, in several fields, the value of information decreases with time. Moreover, the analyst often cannot reanalyze the data after it is streamed, thus, it is important to use appropriate tools.
    In this paper, we propose an alternative clustering functional data stream method to implement existing techniques, and we address phenomena in which web data are expressed by a curve or a function.
  • Social Media Big Data Integration: A New Approach Based on Calibration

    Authors: Luciana Dalla Valle (Plymouth University), Ron Kenett (KPA Ltd. and University of Turin)
    Primary area of focus / application: Quality
    Secondary area of focus / application: Mining
    Keywords: Bayesian networks, Big Data, Calibration, Data integration, Social Media, Information quality
    Submitted at 2-Mar-2017 13:05 by Luciana Dalla Valle
    Accepted
    12-Sep-2017 15:40 Social Media Big Data Integration: A New Approach Based on Calibration
    In recent years, the growing availability of huge amounts of information generated in every sector at high speed and in a wide variety of formats is unprecedented. This is known as “big data”. The ability to harness big data, such as social media, is an opportunity to obtain more accurate analyses and improved decision-making in industry, government and many other organizations. On the other hand, handling big data may be challenging and proper data integration is a key dimension in achieving high information quality (Kenett and Shmueli, 2016). In this paper, we apply a data integration approach that calibrates online generated big data with organizational or administrative data using Bayesian networks. The methodology combines different data sources by identifying overlapping links that are used for calibration and enhanced information quality. It expands earlier work by the authors that was focused on integrating official statistics with administrative data and data from various surveys.
    We illustrate the application of the methodology with an example of integration between online data from blogs and customer satisfaction surveys. This demonstrates how this methodology enhances the information quality (InfoQ) of a study in four of the InfoQ dimensions: Data Structure, Data Integration, Temporal Relevance and Chronology of Data and Goal.
  • A Sequential Hypothesis Test for the Process Capability Index Cpk

    Authors: Michele Scagliarini (University of Bologna)
    Primary area of focus / application: Process
    Secondary area of focus / application: Process
    Keywords: Average sample size, Maximal allowable sample size, Power function, Process capability, Sequential test, Simulation study
    Submitted at 2-Mar-2017 13:30 by Michele Scagliarini
    Accepted (view paper)
    11-Sep-2017 17:30 A Sequential Hypothesis Test for the Process Capability Index Cpk
    In this work we propose a sequential test for the process capability index Cp. We study the statistical properties of the sequential test by performing an extensive simulation study with regard the average of the sample sizes for correctly deciding, for H0 and H1 , the maximum allowable sample size required to achieve a pre-set power level and for ensuring that the empirical type I error probability does not exceed the nominal alfa-level of the test. We compared the performances of the sequential procedure with two non-sequential tests. The results show that the proposed test allows on average smaller stopping sample sizes as compared with the fixed sample size tests while maintaining the desired alfa-level and power. Furthermore, the maximum allowable sample sizes required by the sequential test to achieve the desired power level are smaller than, or at most equal to, the sample sizes required by the non-sequential tests: this means that even in the worst cases the sequential procedure uses a sample size that does not exceed the sample size of the non-sequential tests with the same power level (under H1 ) or without exceeding the type I error probability (under H0).
    Summarizing, the proposed sequential procedure has several interesting features: it offers a substantial decrease in sample size compared with the non-sequential tests, while type I and II error probabilities are correctly maintained at their desired values.

    References
    Hussein, A., Ahmed, S.E., Bhatti, S. (2012). Sequential testing of process capability indices. Journal of Statistical Computation and Simulation, 82(2), 279-282.
    Lepore, A. Palumbo, B. (2015). New Insights into the Decisional Use of Process Capability Indices via Hypothesis Testing. Quality and Reliability Engineering International, 31(8), 1725-1741.
    Pearn, W.L., Chen, K.S. (1999) Making decision in assessing process capability index Cpk. Quality and Reliability Engineering International, 15(4), 321-326.
  • How to Classify Facial Skin Colour: A Comparative Study of 3 Different Investigative Methods

    Authors: Emmanuelle Mauger (Chanel PB), Gabriel Cazorla (Chanel PB), Guylaine Legendre (Chanel PB), Julie Latreille (Chanel PB), Frédérique Morizot (Chanel PB)
    Primary area of focus / application: Other: Statistics for cosmetics
    Keywords: Skin colour, Classification, Principal Component Analysis, Free sorting task, Multidimensional scaling
    Submitted at 2-Mar-2017 17:51 by emmanuelle mauger
    Accepted (view paper)
    12-Sep-2017 18:40 How to Classify Facial Skin Colour: A Comparative Study of 3 Different Investigative Methods
    Skin colour plays an important social role through its impact on the assessment of ethnicity, age, health status and attractiveness. The objective of this study is to compare digital and spectrocolorimeter skin colour measurements and human perception of skin colour. This study was conducted on 94 Caucasian women (20-40 years), on which digital and spectrocolorimeter skin colour measurements were performed. To get what human eye may distinguish, a free sorting task was performed by 15 judges on a selection of 26 pictures among the 94. Each judge had to regroup pictures in as many classes as they wished. For the digital and spectrolorimeter data, the relationships between skin colour parameters were examined using Principal Component Analysis, and typologies were set up. The free sorting task was analysed using multidimensional scaling on dissimilarity matrix followed by clustering method. The search for typologies of skin colour on digital and spectrolorimeter data, lead to respectively 6 and 4 clusters. For the free sorting task, judges split the 26 pictures into 3 to 10 groups, and the search for typology lead to identify 5 clusters. Graphically, the three methods lead to close results, and percentage comparison tests between typologies by pair confirmed the consistency of the three typologies. Therefore, we can use both devices (spectrocolorimeter & digital picture) to investigate the skin colour perceived by a human eye. This colour classification will be useful for recommending make-up strategies for Caucasian women. This approach will be extended to darker skin.
  • The Role of Sensory Assessment in Consumer’s Preferences for Sustainable Coffees: Integrating a Choice Experiment with a Guided Tasting

    Authors: Nedka Dechkova Nikiforova (Department of Statistics, Computer Science, Applications ”G.Parenti”, University of Florence), Patrizia Pinelli (Department of Statistics, Computer Science, Applications ”G.Parenti”, University of Florence)
    Primary area of focus / application: Other: Design of Experiment for Product Quality and Sustainability in Agri-Food Systems
    Keywords: Sustainability, Choice experiments, Bayesian optimal design, Antioxidants in coffee, Organoleptic characteristics, Consumer behavior
    Submitted at 2-Mar-2017 19:43 by Nedka Dechkova Nikiforova
    Accepted (view paper)
    11-Sep-2017 11:10 The Role of Sensory Assessment in Consumer’s Preferences for Sustainable Coffees: Integrating a Choice Experiment with a Guided Tasting
    This talk suggests an innovative approach to analyze the consumer’s preferences for sustainable coffees integrating choice experiments with a guided tasting for the sensory assessment. In particular, two types of coffees have been chosen with different organoleptic characteristics: an intense, soft and aromatic blend (100% Arabica), and a round coffee with high aftertaste intensity (blend of Arabica and Robusta varieties). The two selected coffees have been previously analyzed with respect to their caffeine and polyphenolic antioxidants content by a High Performance Liquid Chromatography (HPLC) method. The plan of the guided tasting provides two scorecards, developed and administered for the organoleptic evaluation, in order to analyze the role of the sensory descriptors in the definition of consumers’ preferences. A Choice Experiment based on Bayesian optimal designs is planned in order to build choice-sets aiming to: i) an efficient estimation of the attributes for the choice experiment, and ii) an efficient estimation of scores obtained through the guided tasting. For this purpose, a compound design criterion (Atkinson et al., 2007) is applied for addressing issues described above. Moreover, the same choice experiment is administered in two consecutive sessions, e.g. before and after the guided tasting, in order to specifically evaluate the role of tasting. All these elements, e.g. the preferences of the choice experiment and the data of guided tasting and HPLC, are subsequently analyzed through the Mixed Multinomial Logit model to better evaluate the consumer’s behavior relating to the coffee consumption.

    REFERENCES:
    Atkinson, A. C., Donev, A.N. and Tobias R.D. (2007). Optimum Experimental Designs, with SAS. Oxford: Oxford University Press.
  • Additive Gaussian Process Models for Improving Robustness in Computer Experiment Models

    Authors: Natalie Vollert (CTR Carinthian Tech Research AG), Michael Ortner (CTR Carinthian Tech Research AG), Jürgen Pilz (Alpen-Adria University of Klagenfurt)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Modelling
    Keywords: Additive models, Gaussian process, Robust modeling, Computer experiments, Reference prior, Relaxed likelihood
    Submitted at 3-Mar-2017 08:00 by Natalie Vollert
    Accepted
    11-Sep-2017 17:30 Additive Gaussian Process Models for Improving Robustness in Computer Experiment Models
    Gaussian process models are a powerful tool to emulate time-expensive computer models. However, they have certain drawbacks, especially when the underlying system features a higher number of dimensions. A well-known method to deal with high-dimensional parameter spaces, known as generalized additive models (GAM) [1], features a simplification by decomposition of the model into a sum of univariate smooth functions. Originally, scatterplot smoothers like splines are used as basis functions but due to the fact that the direct sum of covariance functions is again a covariance function, this concept can also be extended to GPs, referred to as Additive Gaussian Process (AGP) modeling [2,3], by giving each dimension a univariate kernel. The most problematic thing about GP modeling in general is the estimation of the correlation parameters as unsuitable values can prevent a numerically stable inversion of the covariance matrix. This procedure can be stabilized by a Bayesian approach using non-informative reference priors [4,5]. Including prior distributions permits the derivation of marginal likelihoods for the parameters of interest and parameter estimation is then based on the corresponding marginal posterior. In this work a robust AGP model is developed by using a separate reference prior for each univariate kernel. For the estimation of the correlation parameters the relaxed likelihood idea from [2] is adapted to the present problem, meaning that determining the mode of the posterior is done separately for each dimension, under the assumption that the variation caused by variables not estimated yet can be approximated by white noise.

    [1] Hastie T. and Tibshirani R. (1986) Generalized Additive Models, Statistical Science, 1, 297-318
    [2] Durrande N., Ginsbourger D. and Roustant O. (2010) Additive Kernels for High-dimensional Gaussian Process Modeling, Technical report
    [3] Duvenaud D., Nickisch H. and Rasmussen C. E. (2011) Additive Gaussian Processes, in proceedings: Advances Neural Information Processing Systems 24, p 226-234
    [4] Berger J. O., De Oliveira V. and Sansò B. (2001) Objective Bayesian analysis of spatially correlated
    data, Journal of the American Statistical Association, 96, p 1361-1374
    [5] Gu M., Wang X., Berger J. O. (2017) Robust Gaussian Stochastic Process Emulation, submitted to the Annals of Statistics