ENBIS-17 in Naples

9 – 14 September 2017; Naples (Italy) Abstract submission: 21 November 2016 – 10 May 2017

High-Dimensional Copulas for Solving Unbalanced Classification Problems in Industrial Risk Mitigation

12 September 2017, 10:50 – 11:10

Abstract

Submitted by
Nicolas Bousquet
Authors
Nicolas Bousquet (EDF R&D), Bertrand Iooss (EDF R&D)
Abstract
Environmental risks that can typically affect highly-protected production plants are due to lacks of water, floods, clogging from plants or animals, among others. Those events are scarce and unfortunately, in many situations there is no clear understanding of the physical reasons for which a particular phenomenon is an outlier or not.
The expert knowledge on those events expresses only by choosing covariates, assumed to reflect a piece of true information about the regular phenomenon (e.g., a tide within an estuary), and assuming that the outlier is « explained » by the same physics, for objective reasons. The lack of knowledge and the need for industrial mitigation can lead to build a prognostic mechanism (computing for instance a probability of occurrence) based on the statistical classification of feared and non-feared events. In such unbalanced situations, usual learning methods (SVM, random forests, etc.) provide a result which remains biased because of the very low number of feared events, and very simple empirical techniques (Parzen ratio) can do as well as these more elaborated approaches. Nonetheless, several techniques can be used to improve the ROC curve featuring a classification algorithm. The incorporation of expert information can be addressed by using high-dimensional copulas, which can outperform the results of other methods. A governing example highlights the benefits of this approach, that consider the massive clogging problem of French and English production plants.

Return to programme