ENBIS-15 in Prague

6 – 10 September 2015; Prague, Czech Republic Abstract submission: 1 February – 3 July 2015

My abstracts


The following abstracts have been accepted for this event:

  • Robust PCA with FastHCS

    Authors: Eric Schmitt (Protix Biosystems), Kaveh Vakili (KU Leuven)
    Primary area of focus / application: Modelling
    Secondary area of focus / application: Process
    Keywords: High-dimensional data, Outlier detection, Data exploration, Computational statistics
    Submitted at 29-Apr-2015 09:21 by Eric Schmitt
    8-Sep-2015 10:10 Robust PCA with FastHCS
    PCA is widely used to analyze high-dimensional data, but it is sensitive to outliers.
    Robust PCA methods seek fits that are unaffected by the outliers and can reveal them.
    However, state-of-the-art robust PCA methods are vulnerable to classes of outliers that commonly arise in business and industrial applications. In particular, outliers with characteristics of sensor failures resulting in a single value being returned for multiple measurements and clusters of unusual observations are well known to cause issues for robust PCA methods. Robust PCA methods that are unable to mitigate the influence of these outliers produce results that are no more reliable than classical PCA.
    We introduce FastHCS; a general purpose robust PCA method capable of handling a wider range of outliers than other state-of-the-art methods, including those they are resistant to and others that challenge them. FastHCS has high breakdown and is computationally fast.
    After detailing the FastHCS algorithm, we carry out an extensive simulation study and real data applications from chemometrics, image processing and genetics.
    Our results show that FastHCS is systematically more robust to outliers than other state-of-the-art methods.
    Software for FastHCS is provided for R with the FastHCS package, which is available on CRAN.
  • Robust Parameter Identification Applied to the Numerical Simulation of Welding Process

    Authors: Kateryna Dorogan (Electricity of France, Research and Development), Mathieu Carmassi (Electricity of France, Research and Development), Merlin Keller (Electricity of France, Research and Development), Bertrand Iooss (Electricity of France, Research and Development)
    Primary area of focus / application: Quality
    Secondary area of focus / application: Modelling
    Keywords: Robust identification, Optimization, Kriging, Gaussian process, Global sensitivity analysis, MHD models, Weld pool modeling
    Submitted at 29-Apr-2015 11:00 by Kateryna Dorogan
    8-Sep-2015 12:10 Robust Parameter Identification Applied to the Numerical Simulation of Welding Process
    This paper is aimed at proposing a new method for robust parameter identification applied to welding process optimization. In fact, welding is one of the most used repair processes in nuclear engineering. Thus, the quality assessment of weld beads appears to be one of the important issues both from nuclear safety and industrial costs points of view. In practice however, this task is not at all straightforward due to a large number of parameters to be controlled. For instance, we are interested in reproducing numerically one such a weld found in French nuclear power plants. This weld appears to be difficult to control and reproduce experimentally. Some attempts to predict the weld pool shape by numerical simulations were inconclusive because of uncertainties on material and welding parameters. Moreover, as the computational cost of such simulations is too high – the unsteady MHD equations are solved –, we could not cover all kind of possible configurations to choose the best one for each set of input parameters (chemical composition of welded materials, geometry, etc.).
    Another way to tackle this problem could be based on methods for identification under uncertainty. In fact, we are looking for the set of welding parameters allowing the final weld pool shape to satisfy some geometrical criteria. However, the existing approaches are not a priori adapted to identification problems with several constraints. Moreover, some inherent parameters are uncertain and have to be probabilized: the heat source representing the welding torch and the material properties, which vary with the temperature and the chemical composition. The problem can thus be formulated in a robust identification context.
    To solve this problem, the following methodology is proposed:

    1. Performing a global sensitivity analysis of the model in order to retain the most important parameters to be identified. As this step has to be done with a possibly reduced number of numerical simulations: a screening technique (the Morris method) is used.
    2. Calibrating the input parameters of the mathematical model with respect to available data on some real experiments.
    3. Identifying the domain of variation of the welding parameters with respect to the constraints on the model outputs. This step refers to the class of robust inversion problems.

    Since the mathematical model used for welding simulations is expensive enough in terms of computational cost (several hours per run), a metamodeling technique is used in the 2nd and 3rd steps. It allows performing efficient and adaptive numerical design of experiments.
  • Model Based Black-Box Optimization with Mixed Qualitative and Quantitative Inputs

    Authors: Momchil Ivanov (Technische Universität Dortmund)
    Primary area of focus / application: Design and analysis of experiments
    Keywords: Kriging, Qualitative inputs, EGO algorithm, Fanova decomposition, Computer experiments
    Submitted at 29-Apr-2015 14:04 by Momchil Ivanov
    7-Sep-2015 17:00 Model Based Black-Box Optimization with Mixed Qualitative and Quantitative Inputs
    Using metamodel-based sequential strategies to analyze and optimize expensive black-box functions has become standard practice in recent years. The Kriging metamodel is a very popular choice for optimization because of its strong qualities in prediction and also fit-uncertainty assessment capabilities. One of the strengths of Kriging is that it considers the distance between known data points in order to make correlation-based predictions. However this turns into a problem, when the black-box function takes inputs which have mixed qualitative and quantitative values. There are only a handful of existing methods that are able to produce predictions in this mixed-input case and some of them have scalability issues, making them unsuitable for sequential based optimization.

    In this talk we present a novel class of Kriging kernel functions, which are based on the Gower distance and are able to model mixed data. Most importantly, our new modified kernel family shows excellent scalability and is easily implementable with one of the standard and well established optimization procedures – the efficient global optimization (EGO) algorithm. Furthermore, the modified kernels allow for other useful procedures, like a parallelization and dimensionality reduction algorithm, known as the ParOF algorithm, based on powerful sensitivity analysis techniques to be applied to the mixed-inputs case.

    The usefulness of our novel procedure for mixed-input black-box functions is shown with the help of a few benchmark functions. A discussion of the future challenges and research directions in this useful but underrated field of black-box experiments with mixed qualitative and quantitative inputs concludes the presentation.
  • Informative Prior Distributions for ELISA Analyses

    Authors: Katy Klauenberg (Physikalisch-Technische Bundesanstalt), Monika Walzel (Physikalisch-Technische Bundesanstalt), Clemens Elster (Physikalisch-Technische Bundesanstalt)
    Primary area of focus / application: Modelling
    Keywords: Bayesian inference, Heteroscedasticity, Informative prior distribution, Non-linear modeling, Prior knowledge
    Submitted at 29-Apr-2015 16:08 by Katy Klauenberg
    8-Sep-2015 10:30 Informative Prior Distributions for ELISA Analyses
    Immunoassays are bio-analytical tests applied to measure very small concentrations of substances in solution. They have an immense range of application. Enzyme-linked immunosorbent assay (ELISA) tests in particular allow detecting the presence of an infection, of drugs, or hormones (as in the home pregnancy test).

    Inferring an unknown concentration via ELISA usually involves a non-linear heteroscedastic regression and its subsequent inversion. Both can be carried out coherently in a Bayesian framework, for which we develop informative prior distributions. These new priors are based on extensive historical ELISA tests as well as theoretical considerations, e.g. regarding the quality of the immunoassay. The latter leads to two practical requirements for the applicability of the prior distributions: the regression curve is covered well by the calibration points and has a reasonable slope (i.e. the ELISA has a reasonable working range).

    Simulations and sensitivity analyses show that the additional a priori information can lead to inferences which are robust to reasonable perturbations of the model and changes in the design of the data. On a diverse set of real immunoassays, the wide applicability is demonstrated across different laboratories, for different analytes and laboratory equipment as well as for previous and current ELISAs. Consistency checks on real data (similar to cross-validation) underpin the adequacy of the suggested priors.

    The new prior distributions are provided as explicit, closed-form expressions, such that their future use is straightforward. They improve concentration estimation for ELISAs by extending the range of the analyses, decreasing the uncertainty, or giving more robust estimates.
  • Sparse Volatility Modelling for High-Dimensional Time Series

    Authors: Gregor Kastner (Institute for Statistics and Mathematics, WU Wirtschaftsuniversität Wien)
    Primary area of focus / application: Other: ISBA session: Applied Bayesian Modelling - Simplicity meets Flexibility
    Keywords: Dynamic covariance, Curse of dimensionality, Shrinkage, Ancillarity-Sufficiency Interweaving Strategy (ASIS), Predictive distribution
    Submitted at 29-Apr-2015 17:22 by Gregor Kastner
    8-Sep-2015 12:10 Sparse Volatility Modelling for High-Dimensional Time Series
    Dynamic covariance estimation for multivariate time series suffers from the curse of dimensionality; this renders parsimonious approaches essential for conducting reliable statistical inference. We address this issue by modeling the underlying volatility dynamics of a time series vector through a lower dimensional collection of latent time-varying stochastic factors. Furthermore, we apply a Normal-Gamma prior to the elements of the factor loadings matrix. This hierarchical shrinkage prior is a generalization of the Bayesian lasso and effectively pulls the factor loadings of unimportant factors towards zero, thereby increasing sparsity even more. Estimation is carried out via Bayesian MCMC methods that allow to obtain draws from the high-dimensional posterior and predictive distributions. To guarantee efficiency of the samplers, we utilize several variants of an ancillarity-sufficiency interweaving strategy (ASIS) for sampling the factor loadings. Through extensive simulation studies, we demonstrate the effectiveness of the approach. Furthermore, we apply the model to a 20-dimensional exchange rate series and a 300-dimensional vector of stock returns to evaluate predictive performance.
  • Translation Invariant Multiscale Energy-based PCA (TIME-PCA) for Monitoring Batch Processes in Semiconductor Manufacturing

    Authors: Tiago Rato (University of Coimbra), Jakey Blue (École des Mines de Saint-Étienne), Marco Reis (University of Coimbra), Jacques Pinaton (STMicroelectronics Rousset)
    Primary area of focus / application: Process
    Keywords: Batch process monitoring, Fault Detection and Classification (FDC), Translation invariant wavelet decomposition, Energy
    Submitted at 29-Apr-2015 18:40 by Tiago Rato
    7-Sep-2015 15:35 Translation Invariant Multiscale Energy-based PCA (TIME-PCA) for Monitoring Batch Processes in Semiconductor Manufacturing
    The overwhelming majority of processes taking place in semiconductor manufacturing operate in batch mode by imposing time-varying conditions to the products under processing in a cyclic and repetitive way. Among the state of the art approaches to deal with this type of data, multi-way Principal Component Analysis (PCA) and Partial Least Squares (PLS) [1-3] are the most widespread. Although they are able to incorporate the batch dynamic features into the normal operation model, they do it at the expense of estimating a large number of parameters, which makes these approaches prone to over-fitting and instability. Moreover, batch trajectories are required to be well aligned in order to attain the expected performance [4]. This situation significantly increases the burden of process engineers and operators in the implementation of these methods. To address these issues and other limitations of current methodologies for process monitoring in semiconductor manufacturing, we propose a Translation Invariant Multiscale Energy-based PCA (TIME-PCA), that requires a much lower number of estimated parameters. It is free of process trajectories alignment and thus easier to implement and maintain.
    TIME-PCA is based on the analysis of the energy distribution of the different time-frequency scales of the process. Energy is here defined as the average of the squared elements of a vector (wavelet coefficients) at each scale. This set of scale dependent energies is then monitored by application of a latent variable framework, MSPC-PCA. Thus, the proposed methodology is composed by two major stages: (i) a signal decomposition stage where a translation invariant wavelet transform is applied to each variable and (ii) a PCA modelling stage performed on the wavelet coefficients’ energy. Wavelet decompositions were already successfully applied to the monitoring of continuous processes [5-7]. However, the present methodology is considerably different. While the conventional multiscale approach monitors each scale independently in order to extract the scales where abnormal activity occurs, TIME-PCA monitors all scales simultaneously using a single PCA model. Therefore, with TIME-PCA the scale to scale correlation is explicitly investigated and modelled. The use of wavelet coefficients is also advantageous for fault diagnosis since they allow for the isolation of the specific time-frequency scale at which the fault is occurring.
    The proposed procedure was tested with real industrial data from the semiconductor industry and showed to promptly detect the existing faults as well as give useful information about their underlying root causes. Descriptive metadata provided by the process engineers confirmed the reliability of the proposed fault detection and classification algorithm.

    1. García-Muñoz, et al., Industrial & Engineering Chemistry Research, 2004. 43(18): p. 5929-5941.
    2. MacGregor, et al., Control Engineering Practice, 1995. 3(3): p. 403-414.
    3. Nomikos, et al., Technometrics, 1995. 37(1): p. 41-59.
    4. González-Martínez, et al., Industrial & Engineering Chemistry Research, 2014. 53(11): p. 4339-4351.
    5. Bakshi, AIChE Journal, 1998. 44(7): p. 1596-1610.
    6. Yoon, et al., AIChE Journal, 2004. 50(11): p. 2891-2903.
    7. Reis, et al., AIChE Journal, 2006. 52(6): p. 2107-2119.