ENBIS-15 in Prague6 – 10 September 2015; Prague, Czech Republic Abstract submission: 1 February – 3 July 2015
The following abstracts have been accepted for this event:
Robust PCA with FastHCS
Authors: Eric Schmitt (Protix Biosystems), Kaveh Vakili (KU Leuven)
Primary area of focus / application: Modelling
Secondary area of focus / application: Process
Keywords: High-dimensional data, Outlier detection, Data exploration, Computational statistics
Submitted at 29-Apr-2015 09:21 by Eric Schmitt
Robust PCA methods seek fits that are unaffected by the outliers and can reveal them.
However, state-of-the-art robust PCA methods are vulnerable to classes of outliers that commonly arise in business and industrial applications. In particular, outliers with characteristics of sensor failures resulting in a single value being returned for multiple measurements and clusters of unusual observations are well known to cause issues for robust PCA methods. Robust PCA methods that are unable to mitigate the influence of these outliers produce results that are no more reliable than classical PCA.
We introduce FastHCS; a general purpose robust PCA method capable of handling a wider range of outliers than other state-of-the-art methods, including those they are resistant to and others that challenge them. FastHCS has high breakdown and is computationally fast.
After detailing the FastHCS algorithm, we carry out an extensive simulation study and real data applications from chemometrics, image processing and genetics.
Our results show that FastHCS is systematically more robust to outliers than other state-of-the-art methods.
Software for FastHCS is provided for R with the FastHCS package, which is available on CRAN.
Robust Parameter Identification Applied to the Numerical Simulation of Welding Process
Authors: Kateryna Dorogan (Electricity of France, Research and Development), Mathieu Carmassi (Electricity of France, Research and Development), Merlin Keller (Electricity of France, Research and Development), Bertrand Iooss (Electricity of France, Research and Development)
Primary area of focus / application: Quality
Secondary area of focus / application: Modelling
Keywords: Robust identification, Optimization, Kriging, Gaussian process, Global sensitivity analysis, MHD models, Weld pool modeling
Submitted at 29-Apr-2015 11:00 by Kateryna Dorogan
Another way to tackle this problem could be based on methods for identification under uncertainty. In fact, we are looking for the set of welding parameters allowing the final weld pool shape to satisfy some geometrical criteria. However, the existing approaches are not a priori adapted to identification problems with several constraints. Moreover, some inherent parameters are uncertain and have to be probabilized: the heat source representing the welding torch and the material properties, which vary with the temperature and the chemical composition. The problem can thus be formulated in a robust identification context.
To solve this problem, the following methodology is proposed:
1. Performing a global sensitivity analysis of the model in order to retain the most important parameters to be identified. As this step has to be done with a possibly reduced number of numerical simulations: a screening technique (the Morris method) is used.
2. Calibrating the input parameters of the mathematical model with respect to available data on some real experiments.
3. Identifying the domain of variation of the welding parameters with respect to the constraints on the model outputs. This step refers to the class of robust inversion problems.
Since the mathematical model used for welding simulations is expensive enough in terms of computational cost (several hours per run), a metamodeling technique is used in the 2nd and 3rd steps. It allows performing efficient and adaptive numerical design of experiments.
Model Based Black-Box Optimization with Mixed Qualitative and Quantitative Inputs
Authors: Momchil Ivanov (Technische Universität Dortmund)
Primary area of focus / application: Design and analysis of experiments
Keywords: Kriging, Qualitative inputs, EGO algorithm, Fanova decomposition, Computer experiments
Submitted at 29-Apr-2015 14:04 by Momchil Ivanov
In this talk we present a novel class of Kriging kernel functions, which are based on the Gower distance and are able to model mixed data. Most importantly, our new modified kernel family shows excellent scalability and is easily implementable with one of the standard and well established optimization procedures – the efficient global optimization (EGO) algorithm. Furthermore, the modified kernels allow for other useful procedures, like a parallelization and dimensionality reduction algorithm, known as the ParOF algorithm, based on powerful sensitivity analysis techniques to be applied to the mixed-inputs case.
The usefulness of our novel procedure for mixed-input black-box functions is shown with the help of a few benchmark functions. A discussion of the future challenges and research directions in this useful but underrated field of black-box experiments with mixed qualitative and quantitative inputs concludes the presentation.
Informative Prior Distributions for ELISA Analyses
Authors: Katy Klauenberg (Physikalisch-Technische Bundesanstalt), Monika Walzel (Physikalisch-Technische Bundesanstalt), Clemens Elster (Physikalisch-Technische Bundesanstalt)
Primary area of focus / application: Modelling
Keywords: Bayesian inference, Heteroscedasticity, Informative prior distribution, Non-linear modeling, Prior knowledge
Submitted at 29-Apr-2015 16:08 by Katy Klauenberg
Inferring an unknown concentration via ELISA usually involves a non-linear heteroscedastic regression and its subsequent inversion. Both can be carried out coherently in a Bayesian framework, for which we develop informative prior distributions. These new priors are based on extensive historical ELISA tests as well as theoretical considerations, e.g. regarding the quality of the immunoassay. The latter leads to two practical requirements for the applicability of the prior distributions: the regression curve is covered well by the calibration points and has a reasonable slope (i.e. the ELISA has a reasonable working range).
Simulations and sensitivity analyses show that the additional a priori information can lead to inferences which are robust to reasonable perturbations of the model and changes in the design of the data. On a diverse set of real immunoassays, the wide applicability is demonstrated across different laboratories, for different analytes and laboratory equipment as well as for previous and current ELISAs. Consistency checks on real data (similar to cross-validation) underpin the adequacy of the suggested priors.
The new prior distributions are provided as explicit, closed-form expressions, such that their future use is straightforward. They improve concentration estimation for ELISAs by extending the range of the analyses, decreasing the uncertainty, or giving more robust estimates.
Sparse Volatility Modelling for High-Dimensional Time Series
Authors: Gregor Kastner (Institute for Statistics and Mathematics, WU Wirtschaftsuniversität Wien)
Primary area of focus / application: Other: ISBA session: Applied Bayesian Modelling - Simplicity meets Flexibility
Keywords: Dynamic covariance, Curse of dimensionality, Shrinkage, Ancillarity-Sufficiency Interweaving Strategy (ASIS), Predictive distribution
Submitted at 29-Apr-2015 17:22 by Gregor Kastner
Translation Invariant Multiscale Energy-based PCA (TIME-PCA) for Monitoring Batch Processes in Semiconductor Manufacturing
Authors: Tiago Rato (University of Coimbra), Jakey Blue (École des Mines de Saint-Étienne), Marco Reis (University of Coimbra), Jacques Pinaton (STMicroelectronics Rousset)
Primary area of focus / application: Process
Keywords: Batch process monitoring, Fault Detection and Classification (FDC), Translation invariant wavelet decomposition, Energy
Submitted at 29-Apr-2015 18:40 by Tiago Rato
TIME-PCA is based on the analysis of the energy distribution of the different time-frequency scales of the process. Energy is here defined as the average of the squared elements of a vector (wavelet coefficients) at each scale. This set of scale dependent energies is then monitored by application of a latent variable framework, MSPC-PCA. Thus, the proposed methodology is composed by two major stages: (i) a signal decomposition stage where a translation invariant wavelet transform is applied to each variable and (ii) a PCA modelling stage performed on the wavelet coefficients’ energy. Wavelet decompositions were already successfully applied to the monitoring of continuous processes [5-7]. However, the present methodology is considerably different. While the conventional multiscale approach monitors each scale independently in order to extract the scales where abnormal activity occurs, TIME-PCA monitors all scales simultaneously using a single PCA model. Therefore, with TIME-PCA the scale to scale correlation is explicitly investigated and modelled. The use of wavelet coefficients is also advantageous for fault diagnosis since they allow for the isolation of the specific time-frequency scale at which the fault is occurring.
The proposed procedure was tested with real industrial data from the semiconductor industry and showed to promptly detect the existing faults as well as give useful information about their underlying root causes. Descriptive metadata provided by the process engineers confirmed the reliability of the proposed fault detection and classification algorithm.
1. García-Muñoz, et al., Industrial & Engineering Chemistry Research, 2004. 43(18): p. 5929-5941.
2. MacGregor, et al., Control Engineering Practice, 1995. 3(3): p. 403-414.
3. Nomikos, et al., Technometrics, 1995. 37(1): p. 41-59.
4. González-Martínez, et al., Industrial & Engineering Chemistry Research, 2014. 53(11): p. 4339-4351.
5. Bakshi, AIChE Journal, 1998. 44(7): p. 1596-1610.
6. Yoon, et al., AIChE Journal, 2004. 50(11): p. 2891-2903.
7. Reis, et al., AIChE Journal, 2006. 52(6): p. 2107-2119.