ENBIS: European Network for Business and Industrial Statistics

ENBIS-15 in Prague

6 – 10 September 2015; Prague, Czech Republic Abstract submission: 1 February – 3 July 2015

The following abstracts have been accepted for this event:

• EaR Confidence Interval Calculation Using Clustered Bootstrap

Authors: Jérôme Collet (EDF R&D)
Primary area of focus / application: Finance
Keywords: EaR, Bootstrap, Cluster, Simulation study
Submitted at 23-Apr-2015 14:12 by Jérôme Collet
Accepted (view paper)
9-Sep-2015 09:20 EaR Confidence Interval Calculation Using Clustered Bootstrap
As any utility, EDF manages its financial risk, computing EaR (earning at risk) . This EaR computation takes account of risk about prices and volumes. It is necessary to know the accuracy of the EaR computation, that is why we address here confidence interval computation for the EaR.
The risk about prices is, as usually, represented using financial models. In contrast, for volume risk, and especially regarding the activity in France, we want to be consistent with the representation used for generation management. This representation is very detailed; using historical time series for temperature and detailed generation means modeling. So, we have very different amounts of data to represent the random variables used in EaR computation: 121 years for temperature, some hundred years for other hazards simulated for generation management (plant unavailabilities, ...), and around 5000 years for prices. Due to this difference in data set size, we use each temperature scenario for many scenarios of earnings. The earning scenarios are therefore not independent values, so usual methods for confidence interval do not apply.
The solution is to reproduce in the bootstrap the dependencies existing in the original simulation. This general statement leads to a specific bootstrap method, which we implemented. We studied simple toy cases derived from our real case. The first conclusion is that usual bootstrap is wrong in such a case. The second is that our specific bootstrap gives correct confidence intervals.
On real cases, we obtain confidence intervals consistent with other accuracy evaluations. We plan now to use it routinely for EaR assessments.
• Analysis of Tourist Expenditure: Bringing Shape and Volume Together

Authors: Berta Ferrer-Rosell (University of Girona)
Primary area of focus / application: Other: CoDa
Keywords: Trip budget share, Tourist expenditure, CoDa, MANOVA
Submitted at 27-Apr-2015 10:06 by Berta Ferrer-Rosell
Accepted
7-Sep-2015 12:00 Analysis of Tourist Expenditure: Bringing Shape and Volume Together
For the tourism industry the understanding of tourism expenditure determinants is vital. Microeconometric analysis of tourist expenditure as a function of traveller characteristics has focused traditionally on total absolute expenditure and on absolute expenditures per trip budget parts.

The analysis of total absolute expenditure (budget volume, i.e. how much tourists spend) ignores how tourists spend (budget allocation). The analysis of absolute trip expenditure by budget parts confounds how much and how tourists spend: the literature is full of studies in which traveller characteristics affect all trip budget parts about equally.

Using micro official statistics data for air travellers to Spain in 2012 and the compositional data analysis (CoDa) methodology, this study isolates the determinants of trip budget allocation as a first step. The study considers allocation between transportation expenses, as well as at-destination expenses (on the one hand accommodation and food, and on the other activities, shopping, etc.).

As a second step, the study brings trip budget allocation and trip total expenditure together in order not to ignore total trip budget volume. To do that, CoDa with a total is used. This total does not need to aggregate all budget parts, but it can be include some absolute expenditure(s) related to the problem at hand. For instance, for the airline industry it might be interesting to analyse transportation expense, or destinations might be interested in knowing how much tourists spend on anything but transportation.

Results show that some variables affect only the expenditure allocation, some others affect only the total, and some affect both: how and how much tourists spend. For instance, regarding age, retired tourists spend comparatively more at destination than on transportation, and at-destination they spend relatively more on accommodation and food. In absolute terms, they are the heaviest spenders.
• Integer Linear Programming Approaches to Find Row-Column Arrangements of Two-Level Orthogonal Designs

Authors: Nha Vo-Thanh (Faculty of Applied Economics, University of Antwerp, Belgium), Peter Goos (Faculty of Bioscience Engineering and Leuven Statistics Research Centre (LSTAT), KU Leuven, Belgium), Eric D. Schoen (Faculty of Applied Economics, University of Antwerp, Belgium)
Primary area of focus / application: Design and analysis of experiments
Secondary area of focus / application: Design and analysis of experiments
Keywords: Aliasing, Confounding, Crossed blocking factors, Generalized word-length pattern, Integer linear programming
Submitted at 27-Apr-2015 16:37 by Nha Vo-Thanh
Accepted (view paper)
8-Sep-2015 10:10 Integer Linear Programming Approaches to Find Row-Column Arrangements of Two-Level Orthogonal Designs
Nonregular fractional factorial designs offer flexibility in terms of run size as well as the possibility to estimate partially aliased effects. For this reason, there is much recent research on finding good nonregular designs and arranging them in orthogonal blocks. In this contribution, we address the problem of arranging the runs in case there are two crossed blocking factors. We propose two integer linear programming approaches to find an arrangement of a given two-level orthogonal design such that the treatment factors’ main effects are orthogonal to both blocking factors. The first one is a sequential approach which is especially useful in case one blocking factor is more important than the other one. The second approach is a simultaneous approach in case both blocking factors are equally important. We illustrate both approaches with numerical examples.
• Kriging and Artificial Neural Networks in Predicting Turbine Performances

Authors: Grazia Vicario (Politecnico di Torino), Giuseppe Craparotta (Politecnico di Torino), Giovanni Pistone (De Castro Statistics, Collegio Carlo Alberto)
Primary area of focus / application: Design and analysis of experiments
Keywords: Kriging, Variogram, Artificial neural networks, Adaptive designs, Low pressure turbines
Submitted at 28-Apr-2015 12:15 by Grazia Vicario
Accepted (view paper)
8-Sep-2015 17:00 Kriging and Artificial Neural Networks in Predicting Turbine Performances
Since the 1950s, non-stochastic simulation models often support real experiments in industrial research, thanks to a great deal of research activity in the area of Finite Element Simulations (FEM) and Computational Fluid Dynamics (CFD) [1]. Nonetheless, the simulation models can take considerable time to run because of complexity and time-consuming; in other context, the design region may be expansive and a too large number of physical runs may be demand for its coverage. It has become common practice to provide a mathematical tool, the metamodel; it is a surrogate model (a model of a model), a global approximation of the computer experiment response on the design space to capture local minima/maxima.
This paper focuses on a comparison of the performances of two metamodels, the Kriging model and the Artificial Neural Networks (ANNs), in predicting the result of CFD experiments providing energy loss in Low Pressure Turbines (LPT) [2].
The Kriging model, proposed in geostatistics [4], is doubtless the most popular metamodel because of its recognized ability to provide good predictions. The Kriging is already familiar to turbomachinary investigations, particularly when dealing with optimization models. Its main strength is to allow for the underlying correlation structure among the responses. The correlation structure may be evaluated by the parameter estimation of the spatial correlation function (SCF) or by means of the variogram [5], favored by the geostatisticians; the refinements of its specification naturally improve the predictions. In addition to these two traditional methods, we present an innovative use of the experimental variogram [6] as a non parametric tool for avoiding the fitting of a correlation model and its estimation.
ANNs [3] are good at fitting non linear functions, generally unknown, and at recognizing patterns that can depend on a large number of inputs. They have applications in many areas, such as aerospace, automotive, electronics, manufacturing, robotics, and telecommunications.
Finally, we show an application on the design of a LPT, because good predictions lead to optimize the turbine performance, reducing the specific fuel consumption. An adaptive design is used, meaning a strategy to optimize the selection of the sample sites, in order to reach good accuracy of the metamodel at an acceptable computational cost. The prediction of turbine performance will be made both through Kriging, using the two different approaches in characterizing the correlation structure, and Artificial Neural Networks. The different procedures are compared in order to state which one guarantees higher accuracy at an acceptable cost.
• Problems in Implementing New Measurement Unit Definitions of the SI Using Fundamental Constants

Authors: Franco Pavese (formerly INRIM, Torino)
Primary area of focus / application: Metrology & measurement systems analysis
Secondary area of focus / application: Consulting
Keywords: Measurement units, New definitions, Fundamental constants, Metrology, Entreprise, Society, National standards
Submitted at 28-Apr-2015 19:18 by Franco Pavese
Accepted (view paper)
7-Sep-2015 11:30 Problems in Implementing New Measurement Unit Definitions of the SI Using Fundamental Constants
IMEKO TC21 Session
The issue of the measurement units concerns the whole scientific Community, but also involves a much vaster floor, including the users of measurements at all levels of precision, often comparable with the best scientific level achieved. Thus it includes the entreprise and the Society, and the issue is not only a scientific one, but even more critically a practical one and an economic one. This was taken in due account in the Metre Treaty, 150 years ago.
The use of some ‘fundamental constants’ of physics (c0, h, e, kB, NA) with stipulated numerical values has been proposed for the re-definition of some of the present base units of the International System of Units, the SI. This proposal was submitted at its 2011 meeting in Sèvres to the Conférence Générale des Poids et Mesures, who decided in its Resolution 1 to “take note of the intention of the CIPM” for “the possible future revision of the International System of Units”. This position was confirmed in 2014 and a roadmap to 2018 is now available.
Problems about this proposal are found in the literature over the past 10 years concerning the proposal on the floor related to four main areas: (i) how meaningful a progress can be achieved by using fundamental constants in the definition of the measurement units; (ii) which should the constants to be used and how the definitions should use them; (iii) how the definitions should precisely be formulated; (iv) how to deal with future data.
In this paper, the following issues arising from the above problems are summarised and illustrated:
(1) why having more multi-dimensional base units instead of fixing the present problems in that respect;
(2) how the definitions of the units take into account the multi-dimensionality of the constants;
(3) how do the fundamental constants establish the magnitude of the SI base units;
(4) to use or not to use CODATA ‘adjusted values’ of the constants for this specific purpose;
(5) formal issues in stipulating algebraic expressions in the definitions, and in respect to the rounding or truncation of the numerical values in their transformation from uncertain to exact values;
(6) formal issues related to the integer number NA;
(7) handling limitations that can arise from the stipulation of the values of several constants, namely for the CODATA Task Group, and taking into account future data;
(8) implementation of the ‘New SI’ at the NMIs and Society level
(9) creating anew a hierarchy among the Countries signatories of the Metre Treaty concerning their National standards realising the base units.
• Sequential Outlier Detection to Reveal Risk Devices in Semiconductor Industry

Authors: Anja Zernig (KAI - Kompetenzzentrum für Automobil- und Industrieelektronik GmbH), Olivia Bluder (KAI - Kompetenzzentrum für Automobil- und Industrieelektronik GmbH), Jürgen Pilz (Alpen-Adria Universität Klagenfurt), Andre Kästner (Infineon Technologies Austria AG)
Primary area of focus / application: Reliability
Secondary area of focus / application: Process
Keywords: Semiconductor industry, Statistical screening methods, Hypothesis testing, Negentropy, ICA
Submitted at 29-Apr-2015 09:17 by Anja Zernig
Accepted
8-Sep-2015 15:35 Sequential Outlier Detection to Reveal Risk Devices in Semiconductor Industry
In semiconductor industry and especially in the automotive sector devices must meet highest quality standards as they are often used in safety relevant applications, like the airbag. Therefore, various measurement data collected during the production process are monitored and reviewed for anomalies. Devices which are not functional at all and devices with measurement values outside pre-defined specification limits are scrapped immediately. Beside functionality tests and specification limits, statistical screening methods are applied in addition to minimize the risk of delivering unreliable devices (risk devices). The difficulty in detecting risk devices is the fact that these devices are e.g. electrically fully functional, but contain a hidden risk which makes them unreliable in their longtime behavior.
If we assume that measurement data are subject to Gaussian noise the deviations from a Gaussian distribution indicate the presence of suspicious measurements, representing devices with higher risk to fail. Removing these devices leads the distribution of the remaining devices towards a Gaussian distribution. In this paper a sequential procedure for cleaning measurements by removing these devices is presented.
The framework of statistical hypothesis testing is used to check whether the distribution of the measurement data follow a Gaussian distribution or not. If the Null-hypothesis has to be rejected, meaning that the data do not follow a Gaussian distribution, the devices which contribute most to the decision are identified. Then, a sequential removal of these devices is applied until the distribution of the remaining devices is Gaussian. The removed devices are expected to have a high risk to fail early. This assumption is confirmed by the results of a Burn-In study.
To check if an empirical distribution is close to Gaussianity, various statistical tests are available. Well-known normality tests are e.g. Lilliefors, Shapiro-Wilks, Anderson-Darling and Jarque-Bera, each method with its individual advantages and disadvantages. A quite novel approach for a normality test provides the information theoretic quantity named Negentropy. It measures the difference between the entropy of a Gaussian sample and the entropy of the sample under investigation. If the sample under investigation (here measurements) is close to Gaussian, the value of Negentropy tends to zero. Although calculating the entropy consists in calculating the probability density function, in the discrete case various approximations are available. These approximations are frequently used in Independent Component Analysis algorithms and can be adapted for the purpose of statistical normality testing.