# ENBIS: European Network for Business and Industrial Statistics

You are here:

### Overview of all Abstracts

The following PDF contains the Abstractbook as it will be handed out at the conference. It is only here for browsing and maybe later reference. All abstracts as PDF

The following abstracts have been accepted for this event:

• A Corrected Likelihood-Based Confidence Area for Weibull Distribution Parameters and Large-Scale Life Time Data

Authors: Haselgruber, Nikolaus
Primary area of focus / application:
Submitted at 7-Sep-2007 06:48 by
Accepted
The Weibull distribution is, in particular for technical applications, a common life time model and data often will be observed in large-scale experiments. Several methods are available to estimate the distribution parameters and confidence areas usually are computed applying the large sample theory for maximum likelihood estimators. Large-scale life time experiments are expensive, consequently samples tend to be small and of short duration which causes right-censored data. The large sample theory looses its applicability.

This presentation suggests a correction of the likelihood-based confidence area which significantly increases its accuracy for small and moderately censored samples.
• The use of intelligent Experimental Designs for Optimal Automotive Engine Calibration Online at Engine Test Bench.

Authors: Thierry Dalon (Siemens VDO Automotive AG, Regensburg, Germany)
Primary area of focus / application:
Submitted at 7-Sep-2007 07:31 by
Accepted
Control-unit calibration for modern internal combustion engines is currently facing a conflict caused by the additional effort needed to calibrate increasingly complex engine data with a growing number of parameters, together with extremely ambitious objectives regarding the period of time and the resources needed for calibration, performance, consumption, and comfort expected by the customer and emissions levels which are more and more stringent.

To reduce costs we look for reducing testing time at test bench and hence use minimal number of measurements. That leads to Optimal Experimental Design approaches. Designing experiments often leads to trade-offs between local and global search: local criteria encompass achieving best calibration i.e. the optimization of a target (for example performance) under many constraints (emissions, consumption), whereas global criteria tend to explore the whole domain or improve model quality.

We present here the context and methods investigated at Siemens VDO Automotive for optimal engine calibration online at the test bench.
The approach will be illustrated on a practical industrial engine calibration example.

Keywords: Automotive Engine Calibration, Design of Experiments, Online Optimization, Model-based/ surrogate optimization

Specifics: It is a presentation related to Dr. Karsten Roepke expertise field.
• Efficient experimental designs in the presence of more than one hard-to-change variable

Authors: Heidi Arnouts, Peter Goos (University of Antwerp, Antwerp, Belgium)
Primary area of focus / application:
Submitted at 7-Sep-2007 08:04 by Heidi Arnouts
Accepted
In ''real-life'' experiments, especially in an industrial environment, experimental
factors are often not independently reset for each run. This is often due to time
and/or cost restrictions in the production process. A lot of research has been done
for the situation in which there is only one hard-to-change variable in the experi-
ment, the so called ''split-plot'' experimental design. In industrial settings however
there are often more factors that are ''hard-to-change'' and therefore it is also inter-
esting to search for optimal designs that involve several hard-to-change variables.
Some published research deals with this topic but under the restriction that all the
hard-to-change variables are reset at the same time which reduces this problem to
a split-plot experiment. In our research, we relax this constraint and look for D-
optimal designs allowing the various hard-to-change variables to be reset at different
points in time.
• Designs for first-order interactions in choice experiments with binary attributes

Authors: Heiko Grossmann, Rainer Schwabe, Steven G. Gilmour
Primary area of focus / application:
Submitted at 7-Sep-2007 11:17 by
Accepted
Choice experiments aim at understanding how preferences for goods or services are influenced by the features of competing options and applications in marketing, health economics and other fields abound. In recent years, the efficient design of choice experiments has attracted considerable interest. Typically, these designs have been derived within the framework of the multinomial logit (MNL) model. When it is assumed that the choice probabilities within each choice set are equal, the design problem for the MNL model is equivalent to the corresponding problem for an approximating linear model. By using the correspondence between the design problems, in this talk for choice experiments involving pairs of options described by a common set of two-level factors new exact designs are derived which allow the efficient estimation of main effects and first-order interactions. These designs compare favorably with available alternatives in the literature in that for high efficiencies they usually require the same or a considerably smaller number of choice sets. Similarly, for the same number of choice sets they possess the same or a higher efficiency.
• Local Models in Data Mining

Authors: Gero Szepannek, Julia Schiffner and Claus Weihs (University of Dortmund, Dortmund, Germany)
Primary area of focus / application:
Submitted at 7-Sep-2007 12:40 by
Accepted
In classification tasks it may sometimes not be meaningful to build
single rules on the whole data. This may especially be the case if the
classes are composed of several subclasses.

This talk gives
an overview over several proposed methods to solve this problem. These
methods can be subdivided into methods that either need the subclasses
to be specified in advance (see e.g. Weihs et al., 2006) or methods that
determine the locality in the data itself in an unsupervised manner (see
e.g. Hastie et al., 1996 or Czogiel et al., 2007). Some new issues are
also presented. All methods are evaluated and compared on several
real-world classification problems.

References:

Czogiel, I., Luebke, K., Zentgraf, M., Weihs, C. (2007): Localized
Linear Discriminant Analysis. In: Decker,R., Lenz, H. Gaul W. (eds):
Advances in Data Analysis, Springer-Verlag, Heidelberg, 133-140.

Hastie, T., Tibshirani, R., Friedman, J. (1996). Discriminant Analysis
by Gaussian Mixtures , JRSS B 58, 158-176.

Weihs, C., Szepannek, G., Ligges, U., Luebke, K. and Raabe, N. (2006):
Local Models in Register Classification by Timbre. In: V.Batagelij,
H.Bock, A.Ferligoj and A.Ziberna (eds): Data Science and Classification,
Springer-Verlag, Heidelberg, 315-322.
• Load Shedding: a new proposal

Authors: R. Faranda, A. Pievatolo and E. Tironi
Primary area of focus / application:
Submitted at 7-Sep-2007 13:31 by
Accepted
interruptible loads is often the only solution to keep the network in
operation. Normally, in contingencies, the difference between the power
absorbed and the power produced is very low, often less than 1% of the
program, the discomfort would be minimal, considering its usually short
duration. According to this point of view, we present a new approach to
the load shedding program to guarantee the correct electrical system
operation by increasing the number of participants. This new load
control strategy is named Distributed Interruptible Load Shedding
(DILS). Indeed, it is possible to split every user's load into
interruptible and uninterruptible parts, and to operate on the
interruptible part only. The optimal load reduction request is found by
minimizing the expected value of an appropriate cost function, thus
taking the uncertainty about the power absorbed by each customer into
account.
Presently, several users such as hospitals, data centres, supermarkets,
universities, industries, etc. might be very interested in typical
shedding programs as a way to spare money in their electrical account.
However, in the future, when the domotic power plants are likely to be
used widely, the distributors could interest the end users in
participating in DILS programs for either economic or social reasons.
By adopting the DILS program, the distributors can resort to the
interruptible loads not only in case of emergency conditions but also

Key Words - Black out, Demand Side Management, Load Shedding,
Interruptible Load, Stochastic Approximation, Uncertain System
• On-line diagnostics tools in the Mobile Spatial coordinate Measuring System (MScMS)

Authors: Franceschini F. 1, Galetto M. 1, Maisano D. 1, Mastrogiacomo L. 1
Primary area of focus / application:
Submitted at 7-Sep-2007 15:58 by
Accepted
Keywords: mobile measuring system, wireless sensor networks, dimensional measurements, diagnostics, localization algorithms, physical and model redundancy.
Abstract
Mobile Spatial coordinate Measuring System (MScMS) is a wireless-sensor-network based system developed at the industrial metrology and quality engineering laboratory of DISPEA – Politecnico di Torino. It has been designed to perform simple and rapid indoor dimensional measurements of large-size volumes.
It is made up of three basic parts: a “constellation” of wireless devices (Crickets), liberally distributed around the working area; a mobile probe to register the coordinate points of the measured object (using the constellation as a reference system); a PC to store data sent – via Bluetooth – by the mobile probe and to elaborate them utilising an ad hoc application software, created in Matlab. Crickets and mobile probe utilize ultrasound (US) transceivers in order to communicate and evaluate mutual distances.
The system makes it possible to calculate the position – in terms of spatial coordinates – of the object points “touched” by the probe. Acquired data are then available for different types of elaboration (determination of distances, curves or surfaces of measured objects).
In order to protect against causes of error such as, for example, US signal diffraction and reflection, external uncontrolled US sources (key jingling, neon blinking, etc...), or software algorithms non-acceptable solutions, MScMS implements some statistical tests for on-line diagnostics. Three of them are analyzed in this paper: “energy model diagnostics”: based on the “mass-spring system” localization algorithm; “distance model diagnostics”: based on the use of a distance reference standard embedded in the system; “sensor physical/model diagnostics”: based on the redundancy of Crickets’ US transceivers. For each measurement, if all these tests are satisfied at once, the measured result may be considered acceptable with a specific confidence level. Otherwise, the measurement is rejected.
This paper, after a general description of the MScMS, focuses on the description of these three on-line diagnostic tools. Some preliminary results of experimental tests carried out on the system prototype in the industrial metrology and quality engineering laboratory of DISPEA – Politecnico di Torino are also presented and discussed.
• Robust estimation of the variogram in computer experiments

Authors: O. Roustant , D. Dupuy, C. Helbert (Ecole des Mines, Saint-Etienne, France)
Primary area of focus / application:
Submitted at 7-Sep-2007 16:04 by Olivier Roustant
Accepted
This article deals with the estimation of the spatial correlation of kriging models in
computer experiments. Coming from geostatistics, the kriging model is a Gaussian
stochastic process
$$Y(x) = m(x) + Z(x)$$
where $x$ is a $d$-dimensional vector, $m(x)$ is a deterministic trend, and $Z(x)$ a stationary
centered stochastic Gaussian process with spatial correlation function $R(h)$. Both trend
and spatial correlation should be estimated from data. However, this is not the case in
computer experiments, since a specific parametric form for $R$ is assumed. The most
common choice is the anisotropic power-exponential function:
$$R(h) = exp(-\sum_{k=1}^d \theta_k |h_k|^{pk})$$, with $$0 < p_k \leq 2, k=1,...d$$

This contrasts with geostatistics where the spatial correlation is estimated through the
variogram:
$$2\gamma(h) = var(Z(x+h)-Z(x))$$
Defined for intrinsic processes, the variogram is equivalent to $R(h)$ for stationary
processes. Using the variogram instead of the correlation function is recommended even
if the process is stationary, because of possible contaminations by trend estimate
residuals.
The estimation of
$\gamma(h)$ from a given design
$x^{(1)},...,x^{(n)}$ is not an easy task since the
random variables
$(Z(x + h) - Z(x))^2$ are not independent and strongly skewed. In
particular, large values may affect the estimation. For this reason, robust estimation is
encouraged. Two estimators were proposed by Cressie-Hawkins (1980) and Genton
(1998). In this paper, we compare the properties of these estimators with a trimmed
mean. Simulations with various amounts of outliers are done, in the same way as
Genton's. We observe that both estimators give similar results, and both are
outperformed by the trimmed mean. In addition, we extend the study by analyzing the
robustness of these estimators to the deviations from normality. To achieve this, a 3-
dimensional industrial problem is considered.

References:
Chilès J-P., Delfiner P. (1999), Geostatistics. Modeling Spatial Uncertainty, Wiley & Sons

Cressie N. (1993), Statistics for Spatial Data, Wiley & Sons

Cressie N., Hawkins D.H. (1980), ''Robust estimation of the variogram: I'', Mathematical
Geology, 12 (2), 115-125

Genton M. (1998), ''Highly Robust Variogram Estimation'', Mathematical Geology, 30 (2), 213-
221

Huber P.J. (1977), Robust Statistical Procedures, SIAM

Rousseuw P.J., Croux C. (1993), ''Alternatives to the Median Absolute Deviation'', JASA, 88
(424), 1273-1283

Santner T.J., Williams B.J., Notz W.I. (2003). The Design and Analysis of Computer
Experiments, Springer.

Keywords: Computer experiments Variogram, Kriging model, Anisotropy, Robustness.
• Model-robust designs for assessing the uncertainty of simulator outputs with linear metamodels

Authors: B. Gauthier, L. Carraro, O. Roustant (Ecole des Mines, Saint-Etienne, France)
Primary area of focus / application:
Submitted at 7-Sep-2007 16:37 by
Accepted
distribution $Y_{sim}(x)$ of the output of a costly simulator when
the inputs $x$ are random variables with known distribution $\mu$.
Due to the computing time, a Monte Carlo method cannot be applied
directly to the simulator but only to an approximate model
$Y_{app}(x)$. This $metamodel$ is built with few experiments
$X =(x^{(1)},..., x^{(n)})$. The question is: how to choose the design
of experiments $X$, so that the distributions of $Y_{app}(x)$ and $Y_{sim}(x)$ are close?

Consider a deterministic simulator. In many situations, it is approached by a linear
combination of known basis functions $g_0,...,g_p$

$$Y_{sim}(x) = \sum_{i=0}^{p}\beta_ig_i(x) + h(x)$$

with $\beta_0,...,\beta_p$ (unknown) real coefficients, and $h$ an unknown function standing for a
model deviation. The corresponding metamodel is:

$$Y_{app}(x) = \sum_{i=0}^{p}\hat{\beta}_ig_i(x) + \eta(x)$$

where, conditionaly to spatial random variables,
$( \eta (x))$ is a centered Gaussian
process representing the estimation error. The parameters
$\hat{\beta}_0,...,\hat{\beta}_p,\hat{\sigma}^2$ have to be
estimated with the $n$ simulator values calculated for
$x \in X$ , for instance by ordinary
least-squares.
In this framework, one can compute the two spreads
$|E(Y_{app}(x))-E(Y_{sim}(x))|$ and
$|var(Y_{app}(x))-var(Y_{sim}(x))|$. We show that with poor conditions on the model deviation
$h$, it is possible to choose $X$ to minimize these quantities. We assume that $h$ belongs to a
reproducing kernel Hilbert space $H$: in usual cases, this only implies regularity
conditions to $h$. Following Yue and Hickernell (1998), both criteria can be bounded by
expressions depending only on
$||h||_H$. Optimal designs are then obtained by minimizing
the largest eigenvalue of positive definite matrices. Finally, this methodology is
extended to stochastic simulators of the form

$$Y_{sim}(x) = \sum_{i=0}^p \beta_i g_i(x) + h(x) + \varepsilon(x)$$

where
$(\varepsilon(x))$ is a Gaussian process modelling the numerical error.

References:

Carraro L., Corre B., Helbert C., Roustant O., Josserand S. (2007). Optimal designs for the
propagation of uncertainty in computer experiments, Chemometrics and Intelligent
Laboratory Systems, to appear.

Carraro L., Corre B., Helbert C., Roustant O. (2005). Construction d'un critère d'optimalité
pour plans d'expériences numériques dans le cadre de la quantification d'incertitudes,
Revue de Statistique Appliquée.

Santner T.J., Williams B.J., Notz W.I. (2003). The Design and Analysis of Computer
Experiments, Springer.

Wahba G. (1990). Spline Models for Observational Data, SIAM, Philadelphia.

Yue R.-X., Hickernell F.J. (1998). Robust designs for fitting linear models with
misspecification, Statistica Sinica 9, p. 1053-1069.

Keywords: Computer experiments, uncertainty propagation, metamodeling, model-robust
designs, reproducing kernel Hilbert space.
• Genetic algorithms and grid technologies in clustering

Authors: Cs. Hajas, Zs. Robotka, Cs. Seres and A. Zempléni (Loránd Eötvös University, Budapest, Hungary)
Primary area of focus / application:
Submitted at 7-Sep-2007 20:38 by
Accepted
Loránd Eötvös University, Budapest

In our days quite often very large data sets have to be processed. Data mining is
definitely an important and rapidly developing area for such problems. In this
presentation we focus on an important part of such work, namely clustering several
thousand objects of high dimensionality.

For the clustering, we used a version of the genetic algorithm. Such algorithms
imitate the natural selection process by random coupling of pairs of candidates for
the best (fittest) clustering and avoid the convergence to a local maximum by rare,
random mutations. In clustering applications the objective function is based on the
sum of the squared distance between all pairs in the clusters, with a suitable
compensation, which prefers the small number of clusters.

For large datasets and algorithms, which can easily be parallelised, the use of a
grid of computers is a natural, widely used idea. We compared the performance of the
grid-based results of our algorithm to the traditional, single-processor version.
Our data base consisted on 10000 images of medium resolution, so the total size was
around 0.5GB. Such problems may arise in industrial setup as well, such as in
welding processes or in character recognitions for applications such as car
manufacturing (see [1]).

The preprocessing constructs a Gaussian Mixture Model (GMM) representation of the
images. The GMMs are estimated with an improved Expectation Maximization (EM)
algorithm that avoids convergence to the boundary of the parameter space, see [2].
Image clustering is done by matching the representations with a distance-measure,
based on the approximation of the Kullback-Leibler divergence.

References:

[1] Content based threshold adaptation for image processing in industrial application
Aiteanu, D.; Ristic, D.; Graser, A. Control and Automation, 2005. ICCA apos;05.
International Conference on Volume 2, Issue , 26-29 June 2005 Page(s): 1022 - 1027
Vol.2

[2] Zs. Robotka and A. Zempléni: Image Retrieval using Gaussian Mixture Models.
SPLST Symposium, Budapest, 2007.