Submitted abstracts

For the Second Annual Conference on Business and Industrial Statistics

More information on this conference can be found on the events page.

Index by number

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, PO1, PO2, PO3, PO4, WS1, WS2

Sessions

Design of experiments, Statistical modelling, Statistical consulting, Business and economics, Process modelling and control, Reliability and safety, Web mining and analysis, Data mining, Six Sigma and quality improvement, Workshops, Poster presentations, Software session

Design of Experiments

1. Taguchi style Conjoint Analysis in Customer Surveys

Shirley Y. Coleman, D.V.McGeeney and D.J. Stewardson
Industrial Statistics Research Unit, University of Newcastle upon Tyne, UK

Conjoint Analysis is a widely-used survey tool employed to evaluate product or service features. Conjoint Analysis seeks out the desirable features by estimating the structure of a respondent's preferences given their overall evaluations of a set of alternatives. The alternatives present pre-specified levels of the different features or attributes. Conjoint analysts develop and present descriptions of alternative what-if scenarios that are prepared from fractional factorial experimental designs. Taguchi industrial experiments also use these fractional factorial experimental designs. They are based on fundamentally the same concept, however, they differ in a number of important ways. Taguchi experiments often incorporate an inner array that corresponds to the usual orthogonal array of fractional factorial designs and an outer array of nuisance factors. The design factors in the inner array are evaluated in terms of their effect on the mean or level of response and also on the variation of the response across the levels of the outer array. Taguchi recommends a variety of signal to noise ratios to summarise the output, but Grove and Davis consider that analysing the effect of the factors on the mean and on the log(sd) is sufficient. In the conjoint analysis application the outer array could be some demographic factor such as age of respondent, thus we may be interested not only in factors which affect customer satisfaction but also in factors which affect the robustness of the responses across all ages. The advantage of conjoint analysis is that respondents are asked to evaluate features in the same manner as consumers, that is they trade off characteristics against one another during the evaluation process (the term conjoint is a contraction of consider jointly). This paper explores the additional information that can be obtained from a Taguchi style analysis of the responses of conjoint analysis questions in a customer satisfaction questionnaire.

12. Fractional Factorial Experiments in an Industrial Process where Several Factors are Difficult to Change

Frà¸ydis Bjerke, MATFORSK and Kim F. Pearce, ISRU, University of Newcastle upon Tyne

Fractional factorial designs are a class of designed experiments that are commonly used because they are versatile, easily understood by practitioners and implemented in most of the common commercial statistical software packages. In most industrial experiments, a complete randomisation of the individual trials is difficult to achieve. If a factorial design is run with restrictions on randomisation this will influence the error structure, which must be considered during the analysis and statistical modelling of the results. In this paper the effects of modifying the standard ("ideal") design plan in order to obtain a feasible experimental plan will be discussed. The modifications include minimising the level shifts of most factors in the fractional factorial design as well as applying split-plot and strip-plot methods throughout the trials. A full scale experiment run in a filter factory will be presented as a case study to illustrate these issues.

The original design plan for this case study was a standard 29-4 fractional factorial design. However, in practice the number of level shifts were minimised for almost all factors, because they were difficult to change. The design was run more like a 22-design (raw materials) ( a 2(24-design (moulding process) ( a 23-design (impregnation process). That is, 4 batches of raw material were used to make a total of 32 moulded samples which were divided into 8 groups of four samples. These groups were impregnatied simultaneously following the 23-design for impregnation, also yielding randomisation restricitons.

Note that the filters were treated one by one in the moulding process, but treated in groups for the impregnation process. Issues regarding error structures for statistical modelling of experiments with such randomisation restrictions will be discussed and compared to a standard analysis of a 29-4-design.

13. The Practical Application of DoE: assessing the Effect of Time on Chemical Reactions

Marion J Chatfield, Statistical Sciences, GlaxoSmithKline, UK

The chemist asks the question: "should we incorporate time as a factor in our designs or measure the response at various time-points?" Advantages and disadvantages of the two approaches are discussed.
These include:
the quality of the data collected and the information to be
gained from the experimentation
the effect of the approach on choosing a design
implications for setting up, analysis and interpretation of
designs in standard software available to our chemists
pitfalls to avoid
This talk addresses not only the statistical issues related to this question but also the ractical laboratory implementation issues. It illustrates the importance of a partnership between chemist and statistician in addressing the practical application of DoE.

19. Strategies for Multi-Response Parameter Design Using Loss Functions and Joint Optimization Plots

Martina Erdbrügge (speaker), Sonja Kuhnt, University of Dortmund, Germany

The development of high-quality products or production processes can often be greatly improved by statistically planned and analyzed experiments. Taguchi methods proved to be a milestone in this field, suggesting optimal design settings for a single measured response. However, these often fail to meet the needs of today's products and manufacturing processes, which require simultaneous optimization over several quality characteristics. Current extensions for handling multi-responses approaches like those by Pignatiello (1993) or Vining (1998) assume that all responses are weighted beforehand in terms of costs due to deviations from desired target settings. Such information is usually unavailable, especially with manufacturing processes. As an alternative solution, we propose strategies that factorize cost matrices into a product of "standardizing" (data-driven) and "weighting" matrices and use sequences of possible weights assigned to each of the multiple responses. For each weighting, a design factor combination is derived which minimizes a respective estimated multivariate loss function and is optimal with respect to some compromise of the responses.

These compromises - in terms of predicted response means and variances - can be graphically displayed to the engineer by joint optimization plots, who can thereby gain much more insight into the production process and draw more valuable conclusions. The proposed methods are applied to a data set of sheet metal forming.

20. Characterising the Measurement Precision of Thermal Resistance Measurements


Jan Engel, Joris van Kempen (CQM) and Eric Bosch (Philips Research) The Netherlands

The measurement process of the Thermal Interface Resistance tester combines physical measurements and computer simulations to obtain the measured value of the thermal resistance of a material. The computer simulator is calibrated by 20 parameters that have been estimated from various experiments. The uncertainty in these parameters, as well as the error from the physical part of the experiment, contribute to the error in the thermal resistance. The following research questions rised: 1. What calibration parameter errors have a large contribution to the thermal resistance error and 2. How does this error depend on the nominal value of the thermal resistance? There were some ideas on these influences from a sensitivity analysis, but answers were never given.

In a stepwise DoE strategy we combined physical and simulation experiments to obtain the final model for the thermal resistance. We applied recent ideas on folded-over Plackett-Burman designs for screening to reduce the number of calibration parameters to 6, and then used a fractional split-plot design to obtain the final RSM. We applied this RSM to estimate the effect of the calibration parameter errors on the thermal resistance by means of a tolerance design procedure and performed a scenario study. The thermal resistance that was measured under standard conditions was included as a co-variable into the model. This has two benefits:

1. The physical experimental error is effectively eliminated when fitting the RSM.
2. The thermal resistance error is quantified also as a function of the nominal thermal resistance.

Our conclusion is that 2 out of 6 calibration parameters dominate the thermal resistance error, and that their effects strongly depend on the nominal thermal resistance. A final discussion concerns the selection of the RSM for this sort of scenario studies, where the standard model selection approaches may fail.

22. The Design of Experiments with more than One Blocking Factor

Peter Goos, Alexander N. Donev, Martina Vandebroek, Universiteit Leuven, Belgium

One of the important techniques of designing efficient experimental designs is blocking. The observations are divided in groups (blocks) of homogeneous units. This allows for obtaining estimates of the parameters of interest adjusted for the difference between the blocks. Depending on the experiment, the block effects can be modelled as random or fixed.

There is a considerable amount of literature on the design of experiments where the block effects can be treated as fixed. For example, see Atkinson and Donev (1989, 1992) and Cook and Nachtsheim (1989). However, in many experiments the blocks of observations are sampled from a large population for which an inference about the parameters of interest is required. The blocks in such studies are regarded as random and this should be taken into account when the experiment is designed. Goos and Vandebroek (2001) discuss this problem where a single random blocking variable exists. However, in practice there are often several blocking variables and some of them can be treated as random. For example, in a food industry experiment, the effect of adding enzymes in a starch extraction process on the yield was investigated. In the experiment, enzymes from three different suppliers were used as well as wheat from different crops. The suppliers can be regarded as fixed blocks. The origin of the wheat can be looked at as another blocking factor. However, this blocking factor was considered as random.

The algorithmic approach to the construction of experimental designs has been proved to be effective and flexible in situations when the experimental situation is complex. For example, the BLKL exchange algorithm of Atkinson and Donev (1989) can be used to obtain designs when the observations have to be divided in blocks and the block effects can be regarded as fixed. There is however no available solution to the problem described in this paper. Here we present an algorithm that extends the BLKL algorithm to generate designs that can be used in experiments with more than one random or fixed block variables. Its usefulness is illustrated with examples.

23. Utilisation of Robust Design - A Survey in Swedish Industry

Martin Arvidsson, Ida Gremyr, and Per Johansson, Chalmers University of Technology, Gothenburg, Sweden

This paper presents results from a survey regarding use of robust design methodology in Swedish industry. The survey has been conducted through 105 telephone interviews with representatives from different companies in the manufacturing sector. Findings from the analysis will concern the awareness of variation and the extent of use of statistical tools in a robust design framework.
The importance of developing and producing products insensitive to variation was pointed out already in Morrison (1957) and Michaels (1964). Since then, a number of articles and textbooks have been written in the field of robust design; see Kackar (1985), Phadke (1989), and Taguchi (1993). This literature emphasises the importance of taking variation into consideration when developing products. Despite this, Thornton, Donnely and Ertan (1999) reveal that only one third of commercial companies use robust design proactively. The method seems to be more frequently used late in the design process when the product enters the production phase.
Referring back to the findings in Thornton et al. (1999), the purpose of this paper is to explore the reasons for the limited use of robust design methodology in Swedish industry. Another objective is to reveal success factors for robust design applications.

29. Factor Screening, Factor Sparsity and Projectivity

John Tyssedal, Department of Mathematical Sciences, The Norwegian University of Science and Technology, 7491 Trondheim, Norway
The primary goal of a screening experiment is to identify the active factors. The design chosen in order to perform the screening is often a two-level design. The results from the experimentation are often analysed under various assumptions such as effect sparsity, effect hierarchy and effect heredity. These principles work well in many situations, but when the experimental region is close to the factor values that optimize the response or when a polynomial model will not give a good fit, they are of less value.

Another principle is factor sparsity which only put limitations on the number of active factors. In order to analyse the results from the experimentation assuming factor sparsity, the projective properties of the design used are of importance.

It has lately been shown (Box and Tyssedal, 1996&2001) that non-geometric two-level designs may have better projective properties than fractionated designs also called geometric designs. In this work it is demonstrated by examples how two-level designs can be analysed under the assumption of factor sparsity taking only the projective properties of the design into account.
Some conclusions are:

Non-geometric two-level designs are well suitable for factor screening.
Factor screening by two-level design can be performed with weak assumption on
the underlying model.

60. A Scientist's View on Promoting Effective Use of Experimental Design

Martin Owen, GlaxoSmithKline, Uk
This presentation tackles the practical application of statistical tools. Chemical Development is a rich area for collaboration between statisticians and chemists but most statisticians encounter difficulties in promoting the tools to this client group. This paper, presented from a chemist's perspective, is intended to help bridge the gap between the two professions. The paper explores differences in uptake and attitudes towards adopting these techniques and proposes what has hindered or helped the process of effective application. The top ten questions a chemist wants to know about Experimental design will be addressed.
1: Why do we wish to embed experimental design more effectively into our scientific approach?
2: Is experimental design the answer to all our needs?
3: What's in it for me? I haven't got time.
4: "But I'm a well-qualified scientist - why should I need to use a statistical approach?"
5: It looks great...but when can I use it?
6: How can I persuade my manager that this is a good use of resource?
7: How can I use experimental design using that equipment .....?
8: Should experimental design be run as a service or should the chemist be self-sufficient?
9: Can you show me more examples of successful applications and examples where things have gone wrong?
10: So how can I learn more?

62. Comparison of Multivariate Methods for Robust Parameter Design in Sheet Metal Spinning

Göbel, R.; Auer, C.; Erdbrügge, M., University of Dortmund, Germany

In the production of workpieces by sheet metal spinning, methods of statistical experimental design help to achieve processes that meet the complex quality specifications and that are robust with respect to disturbing influences. But due to the complexity of the spinning process only univariate optimization of single quality characteristics has been carried out so far. In order to achieve further improvements, consequently the analysis has to be extended to robust multi-response parameter designs.

Especially in engineering applications often a Taguchi design is recommended for robust parameter design. But the number of runs can be very large and the results are not always satisfactory. Alternatively, combined arrays can be applied that yet have other restrictions. Furthermore, the consideration of multiple responses is up to now not solved satisfactory. Many different approaches exist that often result in conflicting recommendations regarding the parameter settings.

In the research work presented here different multivariate methods for robust parameter design have been compared for the example of sheet metal spinning to choose a capable procedure. For this, a Taguchi design and a combined array were analyzed according to different multivariate methods from the literature, such as the Principle Component Analysis suggested by Su and Tong (1997), Maximization of Desirabilities used by Byun and Kim (1998), or Loss Function based approachesas proposed by Pignatiello (1993). The results were compared regarding applicability, restrictions and results.

It has been shown, that in the investigated spinning process the approaches basing on the simultaneous maximization of the overall desirability of the expectation and the variance are most appropriate. With this a reliable approach that is also easy to handle for a process engineer with less statistical background is available. Based on this findings a compromise, robust parameter setting has been found for the spinning process. This method can also be used for the analysis in robust parameter design in other fields of engineering with similar problems.

64. A Combined Design of Experiments and Computational Fluid Dynamics Approach to the Optimisation of a Well's Turbine Blade Profile

John Daly and Kevin Phelan AMT Ireland, University of Limerick, Limerick, Ireland, Patrick Frawley and Ajit Thakker, Dept. of Mechanical and Aeronautical Engineering, University of Limerick, Limerick, Ireland

This paper deals with the use of Designed of Experiments (DOE) and Computational Fluid Dynamics (CFD) to optimise the design of a blade profile for the Well's Turbine. Using CFD it is possible to predict the aerodynamic characteristics of a given turbine blade geometry. In order to optimise the performance of the Wells Turbine it was necessary to understand the effect of the various factors controlling blade geometry and flow conditions.

Five factors were identified for the experiment, three of these controlled the blade profile, one the cascade geometry and one the flow conditions. Each experiment run generated a unique geometry which was then modelled at a range of flow incidence angles, allowing prediction of the 3D performance of the turbine. The experiment was analysed initially as a half fractional factorial, but showed evidence of significant non-linearity with some of the factors. The initial design was then expanded into a central composite design to identify and correctly model the non-linear factors. By optimising the various responses a blade design was selected, which showed a significant improvement in the predicted 3D turbine performance.

This paper discusses the use of DOE in combination with numerical simulation and shows how, using response surface DOE methods, an unsatisfactory fractional factorial experiment can provide the basis for a comprehensive characterisation of the design variables.

68. Understanding Plackett and Burman Designs Partial Confounding and Projective Properties of Plackett and Burman Designs

Murat Kulahci, Arizona State University, USA and Soren Bisgaard, University of Amsterdam, The Netherlands

Screening experiments are often used when few significant factors are to be identified among many. It is relatively simple to generate screening experiments from two-level factorial designs. For the number of runs a multiple of 4 but not a power of 2 (12, 20, 24,...), Plackett and Burman (PB) designs provide an alternative set of two-level orthogonal designs that can be used in screening experiments. In most screening experiments, the main effects are confounded with two-factor interactions. In PB designs however, the main effects are confounded with fractions of two-factor interactions. For a 12 run PB design for example, the main effects are confounded with of a string of two-factor interactions. This has often been referred to as partial confounding. This makes the analysis of PB designs difficult. On the other hand, recent discoveries about the projective properties of PB designs show that when only a subset of factors in a PB design is considered, one can obtain full and fractional factorials in those factors. For example any three factors of a 12 run PB design give a full factorial plus a half fraction design in those three factors. This suggests that after running the experiment if only three factors are found to be active, the original design can be reduced to the full factorial plus the half fraction and without any follow-up experiment, the original data can be analyzed accordingly. In this study we show that there is a close relationship between the partial confounding in PB designs and their projective properties. With the help of examples, we will also show that understanding this relationship will help the experimenters to make better use of this valuable class of designs.

70. Robust Design of a high-precision optical profilometer by Computer Experiments

Antonio Baldi (Dipartimento di Meccanica, Università di Cagliari), Alessandra Giovagnoli (Dipartimento di Scienze Statistiche, Università di Bologna), Daniele Romano (Dipartimento di Meccanica, Università di Cagliari)

A profilometer is a measuring device that inspects a testpiece surface on a selected area providing the relevant surface profile. An innovative profilometer is being designed at the Department of Mechanical Engineering of Cagliari University (Italy). The prototype is able to measure without contacting the inspected surface and is based on a complex measurement chain involving several optical devices. The measurement of each single profile point is achieved by a step-by-step procedure combining a sequence of hardware operations with a final software evaluation.
The quality of measurement, i.e. bias and uncertainty in the estimated profile points, depends on many design choices and parameters belonging to both the physical and the numerical part of the measurement chain. The effective design of the profilometer needs this dependency to be accurately assessed.
The problem has been split into two sub-problems and analysed in a bottom-up fashion. First, random variations of the data points, as generated by the hardware part of the chain and eventually processed by the numerical algorithm, have been characterised using both theoretical and empirical engineering knowledge. Then computer experiments have been run on the final software stage of the chain estimating models for mean and variance of the measurement result. Control factors are the parameters of the numerical algorithm and noise factors the random errors of the data points.
Up to now computer experiments have been used mostly as a substitute of physical experiments. Here experiments are made on the numerical part of the device and may well be regarded as a hybrid form of experimentation.

76. The Use of Observational Data to Implement an Optimal Experimental Design

Rossella Berni, Department of Statistics "G. Parenti" - University of Florence

The statistical methods tied to experimental design applied in industry are mainly directed to improve quality during the off-line stage but, also, during the production process when the product or the process must be revised. With this respect, it is rather common that the enterprise collects large data sets continuously in time, also called observational data, in which observations on factors and response values are collected to test the quality (sometimes also the reliability) of the product.

In this paper, we focus on these observational data and on the use of these large data sets to implement an experimental design without additional runs, for an efficient use of these data. More specifically, the proposed procedure is based on several steps, aimed to avoiding the problem of the lack of randomization, and on a final experimental design that can be assumed as optimal by the point of view of a sequential optimality.

Briefly, the steps are: 1) the selection of a random sample of trials from the data sets; note that at this step the data involved are only the values of the factors, not the response values; 2) the use of a procedure to build up a design with n0 runs that maximize |X'X| ; 3) starting from the n0 runs, we build up the final experimental design using sequential technique, involving the model specification and, obviously at this step, the response values.

93. A Cross-Over Design for Comparing Two Types of Raw Material

Oystein Evandt, ImPro, Melumveien 44 A, N-0760 Oslo, Norway and Eivind Ovrelid, SiNor AS, Ornesveien 3, N-8160 Glomfjord Norway

The effect of differences in raw material on the mean value of a key performance parameter in a production process, e.g. the yield, is isolated from the effect of differences in production equipment and the effect of conditions varying between different production periods. The design makes it possible to construct a confidence interval for the effect of differences in raw material on the mean value of the key performance parameter. A limitation is that interaction-effects between equipment and raw materials, between equipment and period, and between period and raw materials must be assumed to be negligible. This assumption is however often realistic. (The design is often used to compare effects of different mitigating treatments of chronic diseases.) The design is illustrated by means of data from production of silicon single crystals, and the experimental setup is described.

Motivation
The expected outcome from a production process with respect to a key performance parameter, for example the yield, often depends on the type of raw material used. In situations where this is (or is assumed to be) the case, it is natural to conduct experiments in order to estimate the mean difference in process outcome obtained with different raw materials. It is necessary to take into consideration that two different production units, e.g. two machines or two ovens, are never exactly alike, even if they are nominally so. For comparison of types of raw material it is therefore logical to employ the same production equipment, so that a possible significant mean difference in outcome due to differences in raw material is not confounded with a possible effect of differences in equipment. However, environmental and other conditions that vary over time, such as temperature, air pressure, degree and type of air pollution, or operator skills, may cause significant differences in the mean outcome of different production periods. In such situations the following problem arises. Even if the same production equipment is used with different raw materials in different periods, it is not possible to discriminate between significant differences in mean process outcome arising from different types of raw material and significant differences in mean process outcome due to different environmental, or other, conditions in the various periods.

A cross-over design for comparison of two types of raw material, with respect to influence on the mean process outcome, in situations as described above, is presented.
The design is illustrated by means of data from production of silicon single crystals, and the experimental setup is described.

Statistial modelling

2. Simulation Models for Robust Design using High-Dimensional Location Depth Methods

Gerhard Rappitsch, Michael Kocher, Austria Micro Systems AG and Ernst Stadlober, Institute of Statistics, Graz University of Technology

The robustness of a mixed-signal electronic design is defined as its functional invariance to allowed variations of the production process. In order to create a robust design and to avoid cost - intensive design iterations the spread of the electronic performance must already be observed during circuit simulation at an early design stage. Traditional methods use worst case simulation models for the individual semiconductor devices like MOS transistors, resistors and capacitors. For this purpose corner models are determined by combining production control parameter limits to maximize a certain device performance. The problem of worst case modelling is that by independently combining univariate parameter limits, correlations in the high-dimensional space are ignored and the resulting simulation results are too pessimistic. We present a statistical method which generates sets of the device model that cover the process variation with a minimum number of circuit simulation runs. The corners of the parameters are determined in the space of production control data applying a high-dimensional location depth method, the so called boundary extension method. Furthermore, a statistically typical parameter set for efficient design centering is calculated. The corner parameters are determined and transformed to the domain of circuit simulation parameters ( -SPICE parameters- ) via a special linear mapping. To validate both the applied statistical methods and the mapping procedure results of circuit simulation and measured results of analog/mixed-signal benchmark designs are compared. To be applicable in a circuit design environment with industry standard the models are integrated into an automated parameter generation flow for usage within a CAD framework. As an example a robust stability design of a CMOS operational amplifier is demonstrated.

5. Statistical Efficiency- The Practical Perspective

Ron S. Kenett, KPA Ltd, Raanana 43100, Israel

The idea of adding a practical perspective to the mathematical definition of statistical efficiency is based on a suggestion by Churchill Eisenhart who, years ago gave, in an informal "Beer and Statitics" seminar, a new definition of statistical efficiency. Later on Bruce Hoadley from Bell Laboratories picked up where Eisenhart left off and added his version. Blan Godfrey, former CEO of the Juran Institute, used Hoadley's idea during his Youden Address at the Fall Technical Conference of the American Society for Quality Control in 1988. This report expands on this idea adding an additional component, V(D), value of the data actually collected, which we believe is critical to the overall approach. The formula for computing the expanded concept of Practical Statistical Efficiency will be presented and illustrated with real life examples.

7. Image Analysis of Pipelines

Ragnar Bang Huseby, Norwegian Computing Center
Hans Rossavik Gundersen and Tor Arne Paulsen, Stolt Offshore

Pipelines with a total length of several thousand kilometres are currently used for extraction and transportation of oil and gas in the sea. Most of the pipelines are located on the seabed, fully or partly exposed, while a minor part is buried. Monitoring the placement of the pipes, mapping the surrounding seabed topography, and detecting possible damages are important tasks. The state may change over time, and free spans may arise. Free spans affect the pipeline stress and may cause problems for trawlers.

The inspection of the pipelines is partially done by echo-sounder equipment. The acquired images describe the topography in planes containing cross-sections of the pipes. In this presentation, we outline a statistical approach for recognizing the pipes in these images. The method is semi-automatic and has been successfully used in several inspection tasks. Due to the large amount of data involved, this approach reduces the costs greatly in comparison to manual analysis.

The methodology is based on the assumption that the cross-section of a pipe is circular. It is well known that circles in images can be detected by Hough Transform techniques. However, segments of the seabed profile similar to a circle segment occur frequently. In such cases there will be several candidate centres in the cross-section. The fact that the measurements of location are inaccurate complicates the task of finding the correct candidate.

In order to select the correct centres we utilise information from neighboring cross-sections. The centres from the sequence of cross-sections are linked together such that the curvature of the resulting pipeline is small. The algorithm is based on a statistical model that combines circle detection results and curvature information.

11. Coping with Historical Reliability of Expert Elicitation in the Integration of Subjective Judgements and Historical Data

Enrico Cagno, Franco Caron, Mauro Mancini (speaker), Politecnico di Milano, Italy, Jesus Palomo, David Rios Insua, Universidad Rey Juan Carlos, Madrid, Spain and Fabrizio Ruggeri, CNR-IMATI, Milano, Italy

During the early ``conceptual'' phase of a project life-cycle considering for instance a competitive bidding process when a request for bidding has been received by an engineering & contracting company and the decision to bid has been made the main objective of the proposal manager is to achieve an effective trade-off between the bid competitive value on the side of the client expectations and the project baseline in term of time/ cost / performance constraints on the side of the utilisation of the internal resources. Since project final performance depends primarily on risk analysis and management, a ``risk driven approach'' to Project Management appears to be necessary, particularly during the project early phase when only scarce information is available and contractual obligations are to be taken. Thus, the application of risk analysis methodologies to identify and evaluate possible deviations in project completion in terms of time, cost and performance is tending more and more to become an essential prerequisite for project management quality. Since projects are non repetitive processes, historical data are scarcely useful and subjective judgements constitute the main source of information on the different factors influencing project development. For predictive purposes, the integration of available historical data - which is inevitably limited by the uniqueness of projects - and the subjective judgement of specialists based on previous experience in similar projects is, therefore, an inherent issue in the project management process. The paper proposes a systematic and rigorous methodology to collect and integrate the input data needed for simulation analysis by means of Bayesian inference, taking into account the historical reliability of expert elicitation. The methodology aims to obtain effective estimates of the duration, cost and performance of elementary activities, in order to evaluate the probability distribution of project completion time, cost and performance. It is applied to a real-world case of practical interest involving the planning of a project concerning a process plant.

27. Bayesian Localisation of Mobile Phones

Arnoldo Frigessi, Norwegian Computing Center, Norway
One of the challenges of the telecom industry is to develop a tool able to locate a mobile phone on the basis of the currently operating technology, which is not GPS based. The localisation in space should be precise and very rapidly computable. Only standard signal strengths and time delays should be used. We propose a Bayesian solution of this problem, which is able to incorporate prior information on the territory, measurement error and equipment in use. One of the major issues is the distinction between in and outdoors locations.

28. Progresses on Repeated Screening with Inspection Error and No False Positive

Professor Mauro Gasparini, Politechnico di Torino, Italy, and Dr Jeffrey Eisele, Novartis Pharma AG, Switzerland
A pharmaceutical application to the quality control of batches of pills motivated the following definition. Repeated screening is a 100\\% sampling inspection of a lot followed by removal of the defective items and further iterations of inspection and removal. The reason for repeating the inspection is that the detection of a defective item happens with probability $p<1$, called the sensitivity of the screening. A missed defective is a 'false negative'. Given the pharmaceutical scenario, false negatives are possible, but no false positive is contemplated. Bayesian posterior distributions for the quality of the lot were obtained in previous work.
Progresses on theoretical connections to estimating the number-of-trials parameter of a binomial distribution, on the computation of control limits and on the computation of limiting posterior distributions will be presented and discussed.

30. Outlier and Group Detection in Sensory Analysis Using Hierarchical Clustering with the Procrustes Distance

Tobias Dahl, Department of Informatics, University of Oslo, N-0316 Blindern, Norway. and Tormod Næs, Norwegian Food Research Institute, 1430 Ås, Norway and Department of mathematics, University of Oslo, N-0316 Blindern, Norway.

Generalised Procrustes Analysis (GPA) is a well-known method for analysing three-way data tables. In particular, it is often used to find a representative average for a set of matrices (configurations, shapes, profiles etc.). In this paper it is suggested to use GPA in combination with hierarchical clustering in situations where the data matrices are expected to come from different groups. The Procrustes distance is used as the dissimilarity measure for the cluster algorithm. Applications of the method for analysis of data from sensory panels will be given. It will be shown how the method at an exploratory stage, in combination with regular GPA, can be used to gain insight about important structures of the data set. In particular, it can help the researcher to detect outliers and find sub-groups, help him/her make decisions regarding further analysis, and reduce the risk of erroneous inference.

40. Dynamic Models with Expert Input with Applications to Project Costing

Jesus Palomo, Rios Insua and Fabrizio Ruggeri, Statistics and Decision Sciences Group, Universidad Rey Juan Carlos, Italy
We describe a Bayesian approach for inference and prediction based on a dynamic model and expert's opinion.
We consider forecasting with dynamic models when expert opinion is available at specific forecasting points. Expert input is treated as information used to improve upon our model and learn about the expert quality. Examples of this situation abound in forecasting problems, and we aim at forecasting, at the recently deregulated electrical market in Spain, the daily electricity selling price average in euro/MWh with a dynamic linear model, using every day a financial journal forecasts of such price.
These dynamic models attempt to aid in forecasting costs for a company taking part in auctions, for example, reducing the risk of overrun.
Keywords: Dynamic models, Expert opinion, Project costing, Auctions

45. Analysis of Communication Capabilities of Electronic Power Meters

Antonio Pievatolo, Bruno Betro and Carla Brambilla, Milan, Italy

In the last few years, a new technology has become available for the utilisation of the electric network for transmitting data. Electronic power meters can be remotely controlled to give readings of the consumption and perform various operations. However, their performance can be influenced by the conditions of the network and the the type of loads connected to them.
CESI (an electrical experimental centre in Milano) set up a large experimental field to study the relationship between the quality of the communication and various covariates, and we analysed the data from the experimentation's, the key response variable being the fraction of information packets transmitted correctly.
The probability of success of a single packet is linked to the covariates via generalised linear modelling, and it constitutes the main parameter of the model for the observed data, which arises as a modification of a truncated geometric distribution. The modifications are due to random exogenous causes of failures and to the possibility of re-sending packets a few times after a failure.
With a detailed preliminary analysis of the data and with the help of the model, we discovered a threshold in the signal-to-noise ratio which is critical for an effective communication.
Other factors, which were thought of as influent before the experimentation, were found to have an appreciable but small effect.

52. Calibration of a Biological Reference Material: Parallel Line Assay

Antje Christensen, Søren Andersen, Anne Munk Jespersen, Novo Nordisk, Denmark

The potency of a blood clotting agent is measured against a standard by means of a biological assay. The statistical method used is that of a parallel line assay, that is, the relative potency is determined as the horizontal distance between the logarithmic dose-response curves. The calibration of a new secondary reference standard against a primary standard using a parallel line assay is presented. The determination of confidence limits for the relative and absolute potency of the secondary reference standard is discussed.

A measure for the effect of changing from a former to the newly calibrated secondary standard is considered. This measure is a straight-forward generalization of a measure established for one-point calibration. Confidence limits for the measure are discussed.

Furthermore, a different chemical analysis of the same agent, the content analysis, is considered. This is a chromatographic method, which in the laboratory is based on three-point calibration. The model of three-point calibration is discussed and compared to those of one-point and parallel-line calibration.

53. Calibration of a Chemical Reference Material: One Point Calibration

Soren Andersen, Ulla M.Riber, Novo Nordisk, Denmark

An example is presented in which a new secondary reference material (NEWSRM) is calibrated against a primary reference material (PRM) using a chromatoghaphic analysis and a one point calibration. NEWSRM is also compared to the present reference material (PRESRM) to measure the effect of changing from PRESRM to NEWSRM. Each chromatoghaphic series contained four vials of NEWSRM, four vials of PRESRM and two vials of PRM. In total 11 series from three (slightly different) analytical methods were run over several days, and 110 responses (areas) were measured.

The ratio of area from NEWSRM and area from PRM was estimated in a normal variance component model for log(area) comprising different residual variance for each reference material within analytical method, and a random method*day interaction. The calibrated value for NEWSRM was obtained from the estimated ratio and from the certified value of PRM.

Confidence limits for the calibrated value were calculated with and without the uncertainty from the certified value of PRM.

The vial-to-vial variation of NEWSRM was estimated in a separate study in which 15 vials were analyzed in duplicate in a single series. It is discussed how vial-to-vial variation should influence the confidence limits for the calibrated value of NEWSRM.

56. Baseline Uncertainty in Geometric Tolerance Inspection by Coordinate Measuring Machines: The Case of Position Tolerance with Maximum Material Condition

Gabriele Brondino, Politecnico of Turin, Torino, Italy, Daniele Romano, University of Cagliari, Italia, Grazia Vicario Politecnico of Turin, Torino - Italy

Today's industrial metrology requires substantial support from statistics. Point values are taken as realization of a random variable whose variability defines, and estimates, measurement uncertainty. Two main kinds of uncertainty are considered by standards, namely type A which may be estimated using statistical procedures, while type B require other methods, such as accumulated technical knowledge, mathematical models and so on. This paper deals with type A uncertainty in the evaluation of geometric conformance of mechanical components as measured with Coordinate Measuring Machines (CMM), widely used for such an application.

CMM are generally operated under computer control, returning as response sets of x, y, z Cartesian coordinates pertaining to contact points between a touch probe and the surface explored. A major problem with CMM is that surfaces may be inspected only partially, as but a finite sample of all possible points may be probed; incorrect decisions about acceptance/rejection of parts may therefore be incurred into. This is a sore point with geometric tolerance verification, since the relevant standard were developed taking into consideration the properties of hard gauges; these involve the entire surface in the testing process, unlike CMM.

Evaluation of loss of information due finite sample size, a typical statistical problem, becomes therefore of paramount importance with CMMs. Difficulties arise whenever verification calls for inspection of a number of different surfaces, with a potential accumulation of uncertainty. This is the case of position tolerances, where as many as four surfaces (plus some dimensions) can be involved.

In the paper we examine one of the more complex case of geometric tolerance verification: position tolerance including some features at the Maximum Material Condition. The uncertainty estimation is provided by a Montecarlo simulation of the whole measurement procedure implemented on a computer program. Results point out some interesting inconsistencies between the tolerance standard and the relevant practice on CMM.

79. Analysis of the Travel Safety Indicators Along the Motorway E-75 Using ANOVA

Slobodan Kiproski and Branislava Jakovljevic, University in Novi Sad, Servia

Within the preparations for the Olympic Games in 2004 in Athens it shall be necessary to undertake the rehabilitation of the existing pavement structure of Pan-European route corridor X-B across Serbia. The existing semi-motorway, the section from Hungarian border to Belgrade, by horizontal marking has been marked as asymmetrical.
At moving of vehicles, so called risk situations have been noted in cross section, that is, partial or full usage of the emergency lane for moving of vehicles that has been constructed of the pavement structure of low bearing capacity. This certainly leads to its quicker deterioration with all consequences related to the durability of the structure and travel safety.

During the autumn of 2001 monitoring of the traffic was performed on the characteristic cross section of the section Hungarian border - Belgrade with the aim of evaluation of risk factors which are generated by the usage of the emergency lane as a travel lane. Monitoring was performed continually, by video camera and radar in daily and weekly peak hours.

In practice, conclusions about risk factors are based primarily on experience and application of simple statistics such as mean value, variance and percentages.

The idea behind this report is to establish the model for more precise quantification of risk factors. For problem formulation using ANOVA model is appropriate. On the other hand normality assumption is inadequate, since all traffic problems are related with Poisson distribution. Therefore nonparametric models are used for data analysis fitting risk factors. Continuous monitoring provides for enough data for large scale of replications.

90. Assessing the Cost of providing Quality in Inventory Systems

Claudia Uhlenbrock and Dirk Lehnick, University of Göttingen, Germany

There exist several criteria to measure the performance of inventory systems. Traditionally the focus has been on the minimization of inventory costs required to meet some one-dimensional objective. Thus the standard a- and ß-service-levels are concerned with probability of immediate delivery from stock on hand without an inventory-related delay. However, from the point of view of the customer (here taken to include any downstream node in a supply chain) it is of interest to assess not only the probability that the order can be met immediately but also, in cases where it is not met, the probability of meeting the order within a specified time. Hence the quality of an inventory system needs to be measured in two or more dimensions. The purpose of this paper is to illustrate how one can assess the cost of providing different levels of quality, where the latter is specified in terms of the waiting time distribution.

As an application we consider a single-product single-location inventory system with stochastic demand and present a procedure to model the probability distribution of the inventory-related waiting time. Using this model we quantify the cost of achieving a set of quality criteria based on the waiting times, for example, specifying lower bounds for the probability of immediate delivery and for the probability that the waiting time will not exceed a given number of periods.

Statistical consulting

3. Establishing a Statistical Consulting Unit at a University

RJJ Does, IBIS, University of Amsterdam

In this lecture an overview is given of the ideas behind establishing a statistical consulting unit at Departments of Statistics or Mathematics. Also a comparison between commercial and noncommercial units is made. We stress that if a suitable environment for research and its reward are ensured for the consultants, the members of staff participating in commercial consulting have comparable (or even better) career opportunities to ordinary university graduates. Furthermore, we present an example of a statistical consulting unit at the University of Amsterdam.

15. Small to Medium Sized Industries and the Importance of Reliable Measurement

Kim Pearce, Shirley Coleman, Matt Linsley, FrØydis Bjerke, Lesley Fairbairn, Dave Stewardson, University of Newcastle upon Tyne, UK

This paper discusses the second Business to Business (B2B) mentoring program for measurement reliability which was conducted over a seven month period during 2001. The scheme was organised by the Regional Technology Centre North Ltd. (RTC) with the Industrial Statistics Research Unit (ISRU) acting as topic expert and a large host company acting as mentor. Eight small to medium enterprises (SMEs) located in the North East of England volunteered to take part in the scheme. It was found that regular meetings coupled with interesting 'hands-on' practicals brought down barriers which could have existed between industrialists and statisticians/academics. A suite of statistical techniques were suggested to improve each company's measurement reliability and it was found that those SMEs whose management and other workers took an active interest in implementing the methods were those who benefited the most. Ultimately the B2B project led to an increased awareness of the areas which could be improved and how this could be achieved, better morale amongst the work force and an improvement in measurement practices. By integrating with the work force within the various SMEs, strong relationships were established between all parties involved, hence the work continued with several of the companies after the project had terminated. The presentation will describe the B2B programme, the investigation of measurement reliability and the results achieved. It will also discuss the requirements in terms of company commitment for a successful collaboration.

49. Statistics A Way to Quality Improvement

Helle Doré Hansen and Dorte Rehm, Quality Support, Statistics, Novo Nordisk A/S

The management of the diabetes bulk production at the Danish pharmaceutical company Novo Nordisk A/S decided in October 2000 to use statistics more efficiently as a tool to improve quality and productivity. At Novo Nordic A/S approximately 1000 employees hereby 270 chemists and chemical engineers work in the diabetes bulk production covering fermentation, recovery and purification of insulin. Historically the requirements and thereby the quality focus in the bulk production have been less restrictive than for e.g. the insulin filling plants. However, in the future the authorities will increase focus on quality assurance in the bulk production. The site management believe that statistics is an important tool to increase the knowledge of the processes and thereby improve product quality.

In order to comply with the upcoming requirements a strategic vision has been planned by the site management and the statistical department to increase the level of statistical knowledge among the chemists at the site. Depending on the individual level each chemist has to take appropriate statistical courses followed by an individual project using statistics as a tool.

The strategic plan will be presented as well as examples of the finalised statistical projects that have helped increasing productivity and/or resulted in quality improvement.

57. What Does the Future Hold for Statistics, Statistical Thinking, and Statisticians?

Yu ADLER, Ph.D., Prof. of Moscow Institute of Steel and Alloys and V. SHPER, Ph.D., Russian Electrotechnical State Institute, Moscow, Russia

In this paper we try to discuss the following problems:
* How the main features of the nearest future will affect "The Applied Statistics"?
* What will be happening with "Statistical Thinking"?
* What are the implications of future events for statisticians?
* What should we do now in order to meet the demands of rapidly changing world?
* It seems to us that statistical community is facing these or similar questions so it is very important to start gathering different opinions on this item and try to discuss them though most predictions of the future never come true.

66. Statistical Types and Client Types

Roland Caulcutt, Caulcutt Associates, UK

Does the statistical profession attract certain types of people? In what ways do statisticians differ from the people of other disciplines they meet in the work environment? Is it helpful for a statistical consultant to understand how clients may see the world rather differently?

Theories of personality can shed light on the differences between statistician and client and can indicate how their interactions can be made more effective. The Myers Briggs Type Inventory (MBTI) is one classification of personality types that has been found useful in this context. The MBTI is non-threatening. It focuses on how people prefer to take in information, how they prefer to reach decisions and willing they are to take action.

Delegates will be encouraged to identify their personality type, to compare this with the personality of other statisticians and to consider what difficulties might arise in their interaction with the type of clients they are likely to encounter in various organisations.

To identify your personality type in advance, please read these instructions.

Business and economics

4. Adding Value with Analysis of Operational Data

Kees den Heijer and Philip Jonathan, Statistics & Risk, Shell Global Solutions, International BV, Amsterdam.

Using operational data to best effect in decision-making is a challenge all industrial statisticians face. Modern operational data historians provide a wealth of multivariate time-series data on process and business operation across many fields. To get the most from these data, we need to recognise and exploit the full (multivariate and time-series) structure and adopt appropriate solutions.

This talk will illustrate how industrial statistics is making an impact within Shell, from both the user's and the statistician's perspective. Modern statistical solutions for industry consist of an essential mix of software, training and expert consultancy. For the statistician, a working mix of practical consulting and R&D is equally important.

We will illustrate one software solution developed for users of real-time operational data historians. The software is focussed on the typical sequence of analysis steps that are taken, from data cleaning, reconciliation and visualization, to predictive modelling and validation. We will emphasize the critical role of data cleaning and model validation. Potential users of the software tool first attend a statistical workshop, where the essential ingredients of the software are explained and illustrated. The users are further trained to recognise situations in which expert advice should be sought. A review of some case studies will illustrate the successful application of statistics in manufacturing.

The Statistics & Risk group undertakes a number of research projects to support current consultancy and develop the next generation of statistical solutions. Current topics, including spatial dynamic linear modelling, will be outlined at the end of the talk.

9. Cash-Race - How Business Can Use Improvement Tools and Techniques to Maximise Cash Flow within its Business

Jonathan Smyth-Renshaw, Rolls Royce, Uk

I wish to talk about a method of improvement called 'Cash Race', and as the name suggests the method examines how a business can use the improvement tools and techniques (both lean and quality) to maximise the cash flow within its business. The method starts by asking what is the cash balance of the business, is it stable predictable?
This then leads to a discussion as to how cash flows in and out of a business, which in turn leads to the internal workings within the business. The method gives the 'whole picture' of the business and therefore ensure that improvement programmes are focused and not a 'shot gun or shop floor only' approach. Furthermore, the benefit is measured by cash in the bank, and we all want that!

There is nothing new but the approach is focused and gets the interest of management as it is cash and not 'the quality dept' which drives the improvement. However, the result is the improvement tools and techniques are used, as that is the only way to deliver improvements.

18. Verification of Uncertainty Budgets - Theory and Practice

Birger Stjernholm Madsen, Metrology Department, Novo Nordisk A/S

The purpose of this presentation is to present methods and applications about experimental verification of "uncertainty budgets". According to GUM ("Guide to the expression of uncertainty in measurement", ISO 1995) "uncertainty budgets" are constructed using a "bottom up" approach, using estimates ("standard uncertainties") of each source of error. These estimates can be either "Type A" (estimates from data") or "Type B" (use of other existing knowledge). The standard uncertainties are then propagated using a model function to a combined standard uncertainty.

This approach builds on a number of assumptions, e.g. that all uncertainty sources are identified, and that the model function is correctly specified. If these assumptions are not valid, the uncertainty budget may be of limited value. It thus becomes a very important task to verify the uncertainty budget using (new) experimental data.

Other approaches for assessment of measurement uncertainties exist. ISO 5725 (1-6) "Accuracy (trueness and precision) of measurement results" uses the "top-down" approach, i.e. experimental determination of the accuracy of a standardized measurement method by calculating measures of "repeatability", "reproducibility" and "intermediate precision".

Verification of an uncertainty budget can in principle be done using a precision experiment according to ISO 5725 by comparing the combined standard uncertainty with the reproducibility from the experiment. This can be considered a full verification. A simpler method for full verification of an uncertainty budget is also presented.

Methods for sequential verification are also presented. This includes methods for demonstrating, that "Type B" estimates of uncertainty are of the correct order of magnitude, and that the neglected uncertainty sources are indeed negligible.

38. Stochastic Modelling of Insurance Business

Dmitrii Silvestrov and Evelina Silvestrova, Mälardalen University, Sweden
Results of the development of a pilot program system for stochastic simulation of insurance business with dynamical control of investments SMIB (Stochastic Modelling of Insurance Business) are presented. The basic idea of the program SMIB is to involve Monte Carlo method for producing multiple time scenarios of behaviour for the capital of an insurance company. The capital can be invested into different types of assets (shares, bonds, real estates, etc.). A multi-parameter non-linear dynamical model (about 20 components) is used to simulate the insurance business and the investment process. Premiums, claims, and returns for different types of investments are described by equations of autoregressive type. The multi-threshold and non-stationary in time strategies for dynamical re-distribution of capital into different branches of investment are used. Analytical methods do not work for such complicated non-linear and non-stationary models but Monte Carlo simulation does. The programme has a developed graphical interface. The output information include a histogram of the distribution of capital at given time horizon, expectation, variance, and quantiles, ruin probabilities, etc. The results of simulation studies carried out so far show that non-stationarity and non-continuity of investment strategies sharply impacts the capital distributions that can be highly non-symmetric and multi-modal.

82. Risk Valuation of Bankruptcy of a Bank's Customer: An Application

Pietro Piu, Alessandro Quinti and Luigi Sani, (LIASA, Università di Siena Dip. Metodi Quantitativi), 53100 Siena.

The paper deals with the analysis of insolvency of the customers of a bank of Central Italy considering the classes of Consumers and Small & Middle Enterprises. The purpose is the characterisation of criteria that allow the settlement of scoring classes and the minimisation of the risk of misclassification of a customer to its own class.

To the scope a new methodological approach is proposed to design a Artificial Neural Network (ANN) optimised with Bayesian Techniques, besides a Bayesian Classifier Model and a Logistic one; in order to compare these different models, the criterion of minimising the upper ROC Area has been utilised. The last two models are recommended by the Basil Committee - one of the main international references for the European banks - in the implementation of Credit Scoring and Rating models for credit risk evaluation.

These methods have been implemented to estimate their different forecasting ability within a time horizon of 12 months. The features used are extracted from institutional archives (Archivio Rischiosità Statistica, Flussi di Ritorno Centrale Rischi) of the bank. The best results have been achieved by the Bayesian Artificial Neural Network. This model has made possible the building of a "Decision Support System" identifying the Grey Area, an uncertainty area, and classifying the residual data with a good accuracy. The model identifies in the testing set a Grey Area of 22% in Customers and 25% in Small & Middle Enterprises; on remaining data the Accuracy is of 91% for Customers' Class and 83% for Small & Middle Enterprises' Class.
Key Words: Credit scoring, Rating, Bayesian Artificial Neural Networks.

97. Analysis with the CIR Model of the Italian Treasury Market

Luca Torosantucci, Istituto per le Applicazioni del Calcolo, M. Picone, CNR, Rome, Italy, Adamo Uboldi, Dipartimento di Matematica, G. Castelnuovo, University La Sapienza, Rome, Italy

We estimate by means of the CIR model the short term riskless rate of the Italian Treasury Market for the years 1999-2000, following the analysis made by Barone et al. for the years 1984-1989. The Datastream archive gives us only data on active bonds, so there is a lack of information on short-time bonds: as many practitioners use Datastream, our goal is to verify how much this lack influences the market analysis of the single operator.

We compare the results obtained by using the CIR model and the Nelson method on the same set of data.
Some remarks of our analysis are as follows:
1) Transcription errors and illiquidity of certain bonds can heavily influence the implementation of the term structure. By applying the Chauvenet principle, we propose a method to reject this kind of data.
2) The distribution between real and theoretical values is not normal, showing large skewness and large kurtosis typical of "fat tail" distributions.
3) The term structure obtained by our analysis has a different behavior with respect to the one obtained by Barone, mainly due to the significant changes in the macro-economical framework and to the different expectations of investors. For example, the implied volatility in our period is one third of the Barone's one, which means higher stability of the market.
4) Although there is a serious lack of data for the short term period in the database, the CIR model shows a quite good agreement with medium-long term output given by the Nelson Method.

Summing up, our work can help bond market operators in managing the term structure, showing market changes during the years and giving a statistical analysis of the differences between real and simulated returns.

Process modelling and control

8. Conditional Independence and General Factorisations in Times Series Graphical Models

Speaker: Dr Rob Deardon, Univeristy of Warwick, UK
Co-authors: Professor Henry P Wynn, University of Warwick, UK and Professor Peter E Caines, McGill University, Canada

This work is a contribution to the recent research programme fusing together graphical models and times series, that is, graphical models in which every node is a time series. The main task is to derive conditions for conditional independence. This paper concentrates on the stationary Gaussian case. Two cases are distinguished: global conditional independence when two whole (past, present, future) times series, X, Y are conditionally independent given a whole third series, Z, and local in which the present of X and Y (at time t) are conditionally independent give the past of Z (time < t).
A comparison is made between local and global conditions and computations carried out for autoregressive processes. This work is then applied to data from complicated industrial processes (e.g. waste water treatment plants.)

37. Stochastic Modelling and Simulation of Powder Transformation by Nucleation and Growth

Celine Helbert, Ecole Nationale Superieure des Mines de St Etienne, France

Thermal decomposition of powders are current reactions in industry, it is the case of lime fabrication for instance. It is then important to understand how the product quality evolves with temperature and pressure. In the case of lime fabrication, the transformation proceeds by nucleation at the surface of the grain and growth inward (Mampel), it is then important to determine the behaviour of the characteristic values of nucleation and growth with temperature and pressure.

Using realistic assumptions we can prove that nucleation is a space time Poisson process and that growth is a deterministic spatially homogeneous process. Thanks to this model we can give the fraction transformed of a single grain, of a powder of grains where the shape can be considered as a random set. Therefore, this model is interesting because of its adaptability to every kind of external conditions and because of its independence with respect to geometry. One way to obtain numerically, the fraction transformed of the powder is to use a Monte Carlo simulation. The only drawback of this method is the slowness of the simulations, the issue is then to reduce variance simulation. Moreover, in the case of parameter identification, we study the sensibility of the estimation to different effects : the model error, the measure error, the geometry and the size distribution. It allows us to give confidence intervals for the parameters.

Our final aim is to develop a software for parameters identification. This software will be used by university research laboratories or possibly by industrial research laboratories.

55. A Postmortem of Out-of-Control Signals from a Control Chart

Tirthankar Dasgupta, Statistical Quality Control and Operations Research Unit
Indian Statistical Institute, Baroda, Gujarat, India

Conceived by Walter Shewhart and popularized by W.E. Deming, statistical process control (SPC) is a tool that has widely been used in all sorts of industries across the globe. The extent of theoretical research in this field, especially on control charts, has been phenomenal during the past few decades. However, experiences of practitioners and survey results (Kelly and Drury, 2002) confirm that owing to a strong conflict between reaction to out-of-control situations and ability to meet production schedules, a de-evolution of SPC is observed in several organizations. Research shows that many industrial personnel find the requirement of stopping the production process till the so-called assignable cause is identified and removed disturbing, particularly when the product characteristic is within specification limits. Thus, quite often the daily SPC exercise consists of merely plotting points in the control chart without any action whatsoever and at the end of the day filing it as an evidence of compliance to standardized procedures. To prevent this de-evolution of SPC, Kelly and Drury suggest reengineering the functional responsibilities associated with it. However, it is felt that in order to restore the faith of such users in SPC as a tool for reduction of process variation, it is extremely important to suggest a methodology for detection of causes behind out-of-control conditions at a later stage.

In this article, a framework, that will ensure detection of assignable causes from a control chart long after the production process is over, is suggested. In this framework, all important process parameters that may affect the product characteristic to be charted will be identified through brainstorming prior to implementation of the control chart. The check sheet for collection of control chart data will have provisions for recording each of those process parameters corresponding to each subgroup. A control chart will thus be backed by a powerful database that is expected to explain its behaviour. Suitable non-parametric statistical tests of hypothesis and certain multivariate statistical methods can prove to be useful in analyzing such a database to confirm suspected assignable causes.

The proposed methodology is demonstrated with two case examples - one from a polymer industry and the other from an automobile components industry.


65. MEASURING AND IMPROVING SERVICE QUALITY IN THE FRAMEWORK OF A LOYALTY PROGRAMME: A CASE STUDY

Irena Ograjenšek, University of Ljubljana, Faculty of Economics, Kardeljeva plošcad 17, 1000 Ljubljana, Slovenia

Service industries embraced the basic quality improvement ideas simultaneously with the manufacturing sector, but have been neglecting the use of statistical methods in quality improvement processes even more than their manufacturing counterparts.
Differences in nature of services and manufactured goods have always been emphasised as raison d'être for such a state of affairs in the literature, especially with regard to measurability of service quality attributes and, consequently, characteristics of the measurement process.

In this paper, some prerequisites for a change in the service sector outlook on the use of statistical methods in quality improvement of services are addressed by means of a case study. Chief among the prerequisites is the possibility to unambiguously identify an individual customer. Although several possibilities to do so exist, in the framework of this paper the emphasis is given the so-called loyalty programmes.

Ever since the advent of smart loyalty cards, loyalty programmes have been transcending their traditional role as creators of exit barriers. Presently, they should be regarded primarily as facilitators of customer data collection. By combining objective facts (customer purchase data) and subjective opinions and attitudes (customer survey data), they are shown to be instrumental in developing effective service quality measures to be used in continuous quality improvement of services.

67. Strategies of Portfolio Selection Based on Quantitative Analysis of Market Data are Discussed in Mathematical Finance

Rainer Göb, Institute for Applied Mathematics and Statistics, University of Würzburg, Germany
However, many of the strategies suggested are based on relatively complicated models, and thus they are difficult to be implemented in investment practice. Relatively simple approaches to portfolio selection based on long term estimation of stock returns were discussed in the 1990s. We consider a version of this approach based on an EWMA performance indicator of stocks. An empirical study considering stock data from 1990 until 2000 shows that portfolios selected by this method outperform market indices. Further refinements using additional intervention strategies adopted from statistical process control can be used to reduce the portfolio variance.

72. Application of Multiscale Strategies to Industrial Process Data Analysis

Marco P. Seabra dos Reis and Pedro M. Saraiva, Department of Chemical Engineering, University of Coimbra

Industrial process data typically raises many difficulties when one tries to extract useful information from large databases for tasks such us monitoring, optimisation, safety improvement, emissions control, etc. At the same time these very same difficulties are also challenges for the development of new data analysis methodologies.

Some of the characteristics of industrial process data that lead practical difficulties regarding their exploration are as follows:
* the complex multivariate nature of process variables (eigenstructure);
* its autocorrelated behaviour (dynamic structure);
* existence of variables with different acquisition rates and missing data (sparse structure);
* existence of superimposed (equally relevant) dynamical features, either along time or frequency (multiscale structure);
* the time-delays that exist between the different times of actuation of the various variables on the product under processement (time-delay structure);
* existence of corrupted data (failure structure).

Some techniques have already been purposed to extract and identifying the underlying structures from data with the above features. For instance, multivariate data analysis tools, such as PLS, PCA or PCR do identify linear relationships across variables (i), with extensions also to non-linear modelling and dynamic processes (ii). Also, some strategies for missing data are available (v), as well as for the identification of outliers.

However, all of the above techniques typically look preferentially to data at its finest scale, in detriment to other (time-frequency) analysis windows, that convey relevant information. Furthermore, gross trends are usually overlooked as being the most important ones, at the expense of short time dynamic features.

In our article we will present approaches based upon wavelets that are able to handle multiscale process structures, and can easily be integrated with other approaches. Results obtained from the application of such approaches to real sets of industrial data collected from a Portuguese plant will also be presented and discussed.

80. An Engineering Multivariate Statistical Process Control System to Monitor a Wastewater Treatment Plant Process

Barceló, S. and Capilla, C., Polithecnic University of Valencia, 46022 Valencia, Spain

This paper describes the develop of a control process system by statistical methods to monitor effluent quality in a wastewater treatment plant. The most important quality variable studied in the output is the biological oxygen demand whose analytical value is known five days after of sampling. Other important quality variable in the output is the suspended solids concentration, which can be used as a good predictor of the biological oxygen demand and has the advantage that its value can be sooner known. The input variables influencing the output quality parameters are mainly the raw sewage flow rate and input biological oxygen demand in treatment plant.

These quality variables, important for the studied process present dependence in time in the input and output water flows, as it is commonly stated in these type of processes The temporal evolution as well as the dynamic relationships are analysed for all the relevant variables. In this context, it is inefficient to monitor separately the univariate quality parameters series using traditional Statistical Process Control techniques, which do not take advantage of the relevant information existing in the autocorrelation structure and in the dynamic dependencies of the quality variables. This information allows establishing systems of dynamic prediction of quality variables whose analytical value are delayed in time. We discuss a case study of the application of multivariate quality control charts to detect on-line problems from the appropriate magnitude Using this approach, it is possible to implement a Quality Control System integrating Engineering and Statistical Process Control approaches to monitor the quality of the effluent flow in the treatment plant.

KEYWORDS: Engineering Process Control, Statistical Process Control, Wastewater Treatment Plant, Time-Series Analysis.

81. An Application of the Multivariate Analysis to Costumer Satisfaction Measurement

Joaquín Cestero, Susana Barceló and Mónica Martínez, Polithecnic University of Valencia, 46022 VALENCIA (Spain)

This paper shows how the multivariate analysis techniques can be applied to the measurement of the customer satisfaction. The described case is included in a implementation project of ISO 9001:2000 in a public employment agency. The study objective was to know the satisfaction level that was producing the service in the users (companies and people searching a job) so much of global satisfaction as for some of the service attributes. Also it were intended to know how were influencing the attributes of the service the global satisfaction.

With the ten attributes of the initially identified service was accomplished a factorial analysis to determine if exists some relationship between them that permitted to reduce the number in next studies. Firstly it was determined the number of influential factors. The following step was to determine which attributes were composing each factor using different rotation methods (varimax, equamax).

To determine the importance that each attribute had in the global satisfaction and to verify which were the most important (key drivers) the multiple regression analysis was used. Special attention was given to the multicollinearity. The results got with this implicit method were contrasted with the explicit method.

The results obtained in the study were the base to implement the improvement actions in the service.

92. Short-Term Road Traffic Forecasting

J.M.Loubes, E.Maza (1) ,C.Communay (1)(2) ,J.M.Azais,J.-M. Bardet et F.Gamboa, Universit ´e Pau Sabatier,b ˆat.1R1 -118,Route de Narbonne, 31062 Toulouse cedex 4 division Tra .c-First,soci ´et ´eIRINCOM, (2) project leader

Road traffic measurements are, since many years, in greater quantity and of better quality [1]. The increasing importance given to these data collections results from the need, of traffic operators and of road users, to better manage the carried out displacements in an expanding road infrastructure. This need is clearly identified in many great scale projects [2] [5].

Our study is focused on the road traffic of the expressways of Paris and its suburbs and, more precisely, on the roads providing induction loop detectors. The aim of the study is the forecast, at time H, of the travel time, on a given route, of a user who will take the road at time H+h (h = 0).

We are only interested in average temporal speeds, delivered by detectors every 6 minutes. Thus, our data have the following form, for each induction loop detector: vj(p) for day j, j = 1 ,...,J and period p, p =1 ,...,P.

The studied statistical model is Vj(t) = Zjkfk (t) +Wj(t) where Vj are the random functions whose vj are the achievements, fk functions constitute the traffic profiles, Zjk ? {0,1} random variables are such as = 1, and Wj is the noise random function associated with day j.

After having defined a quality protocol of the velocity measurements, our study is divided into two parts: to estimate by a method of classification the various road traffic fk profiles, as well as the number K of profiles [3] [4 ]; then to use these profiles for the forecast of the travel time.

Two data processing methods, one based on a mathematical classification (model M1 and the other on a classification a priori, i.e. on relevant environmental variables (model M2, are used for the estimate of the fk functions. Finally, an expert model E is in charge, as well as possible, of the use of the M1 and M2 models to travel time forecast.

95. Control Charts: A Cost-Optimization Approach via Bayesian Statistics

Andras Zempleni and Miklos Veber (Department of Probability Theory and Statistics, Eotvos University of Budapest), Belmiro Duarte (Instituto Superior de Engenharia de Coimbra, Portugal) and Pedro Saraiva (Department of Chemical Engineering, University of Coimbra, Portugal)

Control charts are one of the most widely used (and sometimes not in the most sound way) tools in industrial practice for achieving process control and improvement. One of the critical issues associated with the correct implementation of such a tool is related to the definition of control limits and sampling frequencies. Very frequently these decisions are not well supported by sound statistical or economic decision-making criteria, leading to a suboptimal use and results derived from the applications.
In our presentation we will describe a new approach for establishing control limits and sampling times which derives from a combination of Bayesian statistics and economic performance criteria. Previous historical data are used to characterize process mean shifts and define suitable probability density functions. Then, such functions and Bayesian statistics are combined with economic performance criteria (cost estimates for false alarms, for not identifying a true out of control situation, and for obtaining a data record through sampling) in order to find optimal values for control limits and sampling frequencies. This framework is quite general and flexible, so that is can be applied to the most of the situations where SPC is likely to be a useful tool (including different kinds of probability distributions, multivariate contexts, etc.). In particular, our approach can handle a wide range of prior probability distributions, including exponential and certain IFR (increasing failure rate) functions, which play a major role in reliability studies.
Coupling our problem formulation with efficient optimization algorithms we are able to get an efficient procedure for practical SPC applications, resulting in optimal and sound decisions about control limits and sampling frequency values. We will compare the results obtained through this framework with the ones obtained with other more conventional procedures, both to simulated and real sets of data collected from pulp and paper industrial plants.
Acknowledgments: This work was developed by members of the Pro-ENBIS network, which obtained financial support from the EU project GTC1-2001-43031.

98. A Comparison of the Out-Of-Control Performance of Shewhart Control Charts Based on Normal and Nonparametric Methods

Thijs Vermaat, Institute for Business and Industrial Statistics (IBIS UvA)
University of Amsterdam, The Netherlands

Statistical Process Control (SPC) is used to control the quality of products. The most important statistical tool is the control chart. Its purpose is to distinguish common and special causes of variation. The classical method to estimate the control limits of the control chart is based on the average moving ranges. This method assumes that the data are normally distributed. In practice, the outcomes of a process often come from nonnormal distributions. In a recent paper of Ion, Does and Klaassen (2002) different nonparametric methods to estimate the control limits are proposed. The authors have applied these methods to the in control situation, i.e. no special causes occur. In this lecture the out-of-control situation is discussed, i.e. we assume a shift in the mean. We will study these methods for a wide variety of nonnormal distribution functions. The comparison of these methods with the classical method (the average moving ranges) leads to interesting results.

100. An Attribute Control Chart Based On Logistic Regression

R. Noorossana, Iran University of Science & Technology, Tehran 16844 Iran and A. Sadeghi, Iran Khodro Auto Industry, Tehran 13954 Iran

Attribute control charts have historically been used assuming 3-sigma limits and independent observations. When this approach is applied to a process with a trend then many false alarms are generated and the performance of the chart deteriorates significantly. A control chart for attribute data based on logistic regression is proposed to eliminate trends and ultimately improve the performance of a p-chart. A numerical example is discussed to investigate the impact of the proposed chart.

Key Words and Phrases: Statistical Process Control, Logistic Regression, Trend Analysis, Attribute Control Chart

Reliability and safety

10. A Case Study in Applying Statistics to find Simulations of In-field Conditions via Laboratory Tests -: Making Burglary More Difficult

D.J.Stewardson, S.Y.Coleman, Industrial Statistics Research Unit, University of Newcastle upon Tyne, UK


This case study examines the application of statistical techniques to the problem of simulating in-field conditions by laboratory tests. There are many situations where one may wish to assess some product's usefulness, or ability to cope with in use conditions without actually witnessing these directly. Examples of this are accelerated lifetime testing, fatigue testing and pressure testing. In all these the idea is to use some standard test in place of the real-life conditions. Sometimes the test produces a direct comparison such as in stress testing of steel structures. In others the test results do not, but are used as a general indication of product quality that is related to the conditions seen in practice, such as in pipeline pressure testing or steel railway fatigue testing. In this study, a direct comparison is made between the ability of experienced burglars to break through double glazed windows and the tests that were used as a substitute for this activity. Perhaps surprisingly there is a British Standard for a time to break-in test (using real burglars) that establishes whether Double Glazing meets the minimum required security level. There are certain obvious disadvantages in this procedure, not least the fact that the burglars learn which types of window are easiest to break into. The client company, who manufacture security beading that reduces the chance of a successful break-in, wanted to dispense with the services of it's consultant burglars and replace them with laboratory tests, but were not sure how to achieve this. They had three tests, but none appeared satisfactory. We show how the problem of matching them was overcome, and the clients' problem was solved, using Multiple Regression of historical data.

34. A Demonstration of the Need for More Sophisticated Statistical methodology in setting and Applying European Standards for Safety Clothing and Equipment

D.J.Stewardson and S.Y.Coleman, Industrial Statistics Research Unit, University of Newcastle upon Tyne, UK

This paper looks at the need for the use of more sophisticated statistics within European standards, particularly those concerned with the production of safety clothing and equipment. Currently these tend to go no further than assessments based on two broad components of error, repeatability and reproducibility. This can lead to very large confidence bands around critical values due to the often large numbers of contributory factors contained within the reproducibility component. In a number of examples it can be easily shown that far better precision could be achieved by allowing identification of the components due to these different, and often clearly identifiable, factors. Some standards also call for removal of large amounts of data from the analysis when a single outlier is present. This is shown to be both wasteful of resources and bad statistical policy. The thrust of the paper leads to the idea of replacing some standards that rely on reference materials with standards based on statistical models, ones that can be used as a benchmark against which new or changing circumstances can be judged. This idea, if adopted, could revolutionize the use of product and performance standards in the future. Put simply, real materials vary and can and do deteriorate over time; mathematical models do not.

42. Optimal Designs for Accelerated Runs; Estimation of Life Time

Professor Giorgio Celant, Universita di Padova, Italy
An optimal design is presented for the estimation of the extrapolation of the mean lifetime of a system, observed under stressed conditions. Optimality is intended as a function of the MSE.

44. Bayesian Analysis and Prediction of Failures in Underground Trains

Antonio Pievatolo, Raffaele Argiento and Fabrizio Ruggeri, Milan, Italy

In this paper we analyse the case of a public transportation company wishing to check the actual reliability of the door opening system of its subway trains in the early operating time. This activity is aimed at protecting the company against an unexpected increase in the number of failures compared to what declared in the warranty accompanying the purchase contract. For obvious reasons, this comparison should be carried out before the expiration of the warranty, by adding to the observed number of failures at the time of the analysis the predicted one up to the expiration date. The available data are the failure date and the reading of the odometer for each occurred failure of 40 trains during a eight-year period. Assuming that a minimal repair strategy is adopted, a Poisson process model on the time scale describes the sequence of failures. Time must be considered to take into account an evident seasonal effect, whereas the kilometres run are incorporated within the intensity function as a random function of time, modelled as a gamma process.
Bayesian inference is then carried out via Monte Carlo simulation, obtaining prediction intervals for the expected number of failures during periods of desired length, using only part of the data. The predictions are then compared with the observed data.

47. Analysis Tools for Competing Risk Failure Data

Cornel Bunea and Roger Cooke, TU Delft, Mekelweg 4, Delft, The Netherlands
Bo Lindqvist, NTNU Trondheim, N-7491 Trondheim, NORWAY
Modern Reliability Data Bases (RDB's) are designed to meet the needs of diverse users, including component designers, reliability analysts and maintenance engineers. To meet these needs RDB's distinguish a variety of ways in which a component's service sojourn may be terminated. Up until quite recently, this data was analyzed from the viewpoint of independent competing risk. Independence is often quite implausible, as eg when degraded failures related to preventive maintenance compete with critical failure. The maintenance crew is trying to prevent critical failures while losing as little useful service time as possible; and hence is creating dependence between these competing risks. We have recently learned how to use simple models for dependent competing risk to identify survival functions and hence to analyze competing risk data. This type of analysis requires new statistical tests, and/or adaptations of existing tests. Competing risk models are described in [Cooke and Bedford 2002]. In this paper we present a number of tests to support the analysis of competing risk data.
Competing risk data may be described as a colored point process, where each point event is described by a number of properties, and where a coloring is a grouping of properties into mutually exclusive and exhaustive classes. For example, a maintenance engineer is interested degraded and incipient failures, as they are associated with preventive maintenance. He is trying also to take the least expensive maintenance action: repair action or adjustment action are favored above replace action. Critical failures are of primary interest in risk and reliability calculations and a component designer is interested in the particular component function that is lost, in the failure mechanisms and he wishes to prevent the failure of the most expensive components of the system.
In addition to this, two other main operations may be performed on the data: superposition, pooling. Time histories having the same begin and end points may be superposed. The set of event times of the superposition is the union of the times of the superposed processes. In general, superposition is performed in order to obtain a renewal process. If the maintenance team returns components to service as good as new, then all time histories of the components should be superposed.
The pooled data are considered as multiple realizations of the same random variable or stochastic process. When time histories are pooled, these are considered as realizations of the same (colored) point process. In general, pooling is performed on identical independent point processes in order to obtain better statistical estimates of the inter-event distribution. To perform these operations on data, a set of questions arise, requiring statistical tests to answer these questions:
Are the time histories homogeneous and independent?
Independence will fail, if the events for the time histories of the components tend to cluster in calendar time. If homogeneity fails, the uncolored events should not be regarded as realizations of the same point process. If homogeneity holds, then the number of events up to time t for every component should not differ significantly.
Is the coloring stationary?
The pooled process is now considered as colored. The coloring is stationary if the proportion of ``red" and ``green" points does not vary significantly with calendar time.
Is the process ``color blind" competing risk?
The process is color blind if the distribution of the i-th event is independent of the color of the previous event. Color blindness implies that the processes gotten by splicing together all inter event times beginning with color j, j=1,...n, are homogeneous.
Is the uncolored process stationary competing risk?
Is the uncolored process renewal competing risk?In order to answer to this questions, a number of statistical test are presented, together with an application to the data set coming from two identical Compressor Units from one Norsk Hydro ammonia plant, for the period of observation 2-10-68 up to 25-6-89.

Web mining and analysis

14. Statistical Models for the Forecast of the Visit Sequences on Web Sites

Magda Rognoni, Paolo Giudici - University of Pavia

In an e-Commerce site the knowledge of its users' profile is fundamental for the attainment of the different purposes which one suggests to reach. The aim of my Master' s thesis consists in the construction of a statistical model (the Markov Chain model) directed towards the forecast of the visit sequences at Web sites, that is to the forecast of what page a visitor, knowing where she/he is in this time instant, will click in the following instant, not considering his way "past".

Preliminarily the Cluster Analysis of the visitors of a German computer site has been made. Later we have analysed in detail the characteristics of each of four obtained Clusters, we have chosen one Cluster and for it we have built the statistical Markov Chain model. From the analysis of the Markov transition matrices obtained, we have identified the behaviour of the visitors in particular situations, such as the entry point in the site and the most probable exit point. We have also found the most probable path of visit and a few characteristics of the "typical" pages of the site. This part of the analysis has been developed and implemented with the SAS/Base software (R).

Later, we have compared our results with association rules. In particular, the analysis of the direct sequences of two pages has been made using the algorithms and the statistical indexes already implemented in the SAS/Enterprise Miner (TM) software, that is, the indexes of Support and of Confidence.
Doing a comparison with the results obtained previously with the method of the Markov Chains we discovered the analogies and the differences between the two methods.

41. Association Rules for Web Cliickstream Analysis

Paolo Giudici, Facolta di Economia, Universita di Pavia, Italy
In this talk we compare association rules, based on the a priori algorithm with more statistical methodologies to detect local association patterns. The latter include probabilistic expert systems as well as Markov chain models The methodologies are compared and evaluated on the basis of a real dataset, that concerns an e-commerce site.
The conclusion is that, while local models are suited to analyse large databases, classical statistical methods have still a lot to say in this field.

51. Some modelling issues in Web Mining

Luca La Rocca and Lilla Di Scala, Universita degli Studi di Pavia, Italy

We discuss some modelling issues which arise in the analysis of clickstream data, such as that originating from Internet Web traffic. In particular, we focus our attention on a site-centric scenario, in which surfers' paths within a site are logged by the server but no information is available regarding surfers' actions outside the site. This discussion was prompted by the opportunity we had of working on the log-files of an e-commerce site with the aim of studying the relationship between surfers' behaviour and their attitude towards on-line purchasing. Having decided that the behaviour of surfers is well represented by their transitions from page to page, that is by the hyper-links they follow, we consider a Markovian model as our starting point.
Due to the site-centric nature of the problem, there is a need to make inference on the non-observable transitions in and out of the site; we therefore suggest adding to the model a latent page which serves to represent the rest of the Web. We then discuss how to investigate the length of the surfers' memory, suggesting that Mixture Transition Distribution models can help increase the memory lag while keeping down the dimension of the parameter space.
Finally, we take into consideration surfers' heterogeneity and tackle the problem of deciding how many components should a finite mixture of Markov chains have, in order to properly model a given population of surfers.

Data mining

16. Multivariate Process Control to Improve the Quality of Batch PPOX Production

Zarzo, M.; Ferrer, A.; Romero, R;- Manuel Zarzo Castelló, Alberto Ferrer Riquelme and Rafael Romero Villafranca, Universidad Politecnica de Valencia, Spain

The generalised use of electronic sensors for automatic control systems in chemical processes allows on line information about the variables involved in the process, generating a huge amount of data rarely used statistically to gather information. Once the product has been produced, some quality parameters are analysed off line, usually in laboratory, whose values should be optimised to improve quality. It is very interesting to take advantage of all this information available on line and off line to detect changes in the process and mainly to identify causes of variability (critical points).

From a chemical process that produces PPOX (polypropylene oxide) 31 batches have been studied. Every batch lasts about 24 hours, consists of four stages and several substages, and 52 variables are controlled every minute. At the end of the batch the residual content of water is analysed off line. This quality parameter should be kept as low as possible.

From the process variables a matrix of 9133 aligned variables by 31 batches has been built. A multivariate statistical analysis PCA and PLS (managing more variables than observations) has been conducted applying several approaches of variable selection. The residual content of water turns out to be correlated with the temperature profile during the dehydration stage of PPOX. A multivariate control model has been built. This model is able to detect on line out-of-control situations related with temperature upsets, that will increase the content of water unless corrective measures are taken.

43. Data Mining Quality Control Data: A Case Study

P. Costantini, S. M. Pagnotta, and G.C. Porzio, Universita degli Studi di Cassino, Italy and G. Ragozini, Universita degli Studi di Napoli Federico II, Italy

An Italian factory controls its production process through about 70 characteristics for each product, with over 1000 products a day. This huge data set (roughly half a million data a week) calls for data mining quality control data, and it question us on the ways to extract relevant information within a multivariate quality control setting. In particular, we first discuss few simple methods useful for a preliminary graphical exploration of the process. Then, we give special focus to the quality control index currently used by the factory managers: the index itself is composed and decomposed over variables, products and time in order to gain insight from the process.

Furthermore, to obtain a more sensitive tool with respect to the current practice, we propose to consider a CUSUM-type control chart to be based on the empirical distribution of such an index. This tool yield visual representation of the state of the process, allowing to decide upon appropriate intervention. Finally, the work is completed with a note on the issues related to the way quality measurement are currently taken at this factory.

46. A Statistical Model to Predict Gift Patterns in Large Distribution Promotions

Federica Cornalba -University of Pavia, stage at CoRo Marketing Activities Srl
Prof. Paolo Giudici - University of Pavia, Dr. Andrea Lazzarini - Head of Promotions, CoRo Marketing Activities Srl, (Via Pirandello, 21/A PIACENZA - Tel.: 0523/482124 - Fax: 0523/489882); Dott.ssa Paola Rebecchi - Head of statistics, CoRo Marketing Activities Srl;

We consider a problem about the way of predicting the exit of gifts in a promotional catalogue for a chain of Hypermarkets located in Northern of Italy.
The aim of this paper is to compare experiential and statistical methods to achieve a reliable predictive model.

We realize a predictive statistical model using Microsoft Excel. In this way we can obtain a data processing-statistical instrument that allow us to join the valuation of data with the commercial needs of the large-scale retail trade (GDO). We underline the role of performance evaluation data using methods for resolving problems.

50. Comovements and Contagion in Emergent Markets: Stock Indexes Volatilities

Hedibert Freitas Lopes and Helio Santos Migon, Brazil
The past decade has witnessed a series of (well accepted and defined) financial crises periods in the world economy. Most of these events are country specific and eventually spreaded out across neighbour countries, with the concept of vicinity extrapolating the geographic maps and entering the {\\it contagion} maps. Unfortunately, what contagion represents and how to measure it are still unanswered questions.
In this article we measure the transmission of shocks by cross-market correlation coefficients following Forbes and Rigobon's (2000) notion of {\\it shift-contagion}. Our main contribution relies upon the use of traditional factor model techniques combined with stochastic volatility models to study the dependence among Latin American stock price indexes and the North American index. More specifically, we concentrate on situations where the factor variances are modelled by a multivariate stochastic volatility structure.
From a theoretical perspective, we improve currently available methodology by allowing the factor loadings, in the factor model structure, to have a time-varying structure and to capture changes in the series' weights over time. By doing this, we believe that changes and interventions experienced by those five countries are well accommodated by our models which {\\em learns} and {\\em adapts} reasonably fast to those economic and idiosyncratic shocks.
We empirically show that the time varying covariance structure can be modelled by one or two common factors and that some sort of contagion is present in most of the series' covariances during periods of economical instability, or {\\it crises}. Open issues on real time implementation and natural model comparisons are thoroughly discussed.

63. Clustering Financial Time Series: An Application to Mutual Funds Style Analysis

Francesco Pattarin, Dipartimento Di Economia Aziendale, Modena, Sandra Paterlini, Dipartimento Di Economia Politica, Modena, Tommaso Minerva, Dipartimento Di Sceinze Sociali, Reggio Emilia, Italy

Classification can be extremely useful in giving a synthetic and highly informative description of contexts characterized by high degrees of complexity. Different approaches could be adopted to tackle the classification problem: statistical tools may contribute to increase the degree of confidence in the classification scheme. In this work we propose a classification algorithm for mutual funds style analysis which combines different statistical techniques and exploits information readily available at low cost. Objective, representative, consistent and empirically testable classification schemes are strongly sought for in this field in oreder to give reliavle information to investors and fund managers who are interesed in evaluating and comparing different financial products. Institutional classification schemes, when available, do not always provide consistent and representative peer groups of funds. We propose a "return-based" classification scheme aimed at identifying and attributing mutual funds' styles by analysing time series of past returns.

Our classification procedure consists of three basic steps: (a) a dimensionaity reduction step based on principal component analysis, (b) a clustering step that exploits a new evolutionay clustering methodology, and (c) a style identification step via a constrained regression model first proposed by William Sharpe. We have tested our algorithm on a wide sample of Italian mutual funds and have obtained satisfactory results with respect to (i) the agreement with the existing institutional classification and (ii) the correct identification of outlying funds. This piece of evidence strongly suggest that combining different statistical techniques may produce useful quantitative tools for supporting the funds selection process and encourages further research in this area.

Keywords: classification, evolutionary clustering, mutual funds style analysis.

73. Statistical and Computational Models for the Forecasting of Television Shares

A.Bilotta Via Milazzo, 15057 Tortona (AL) and P. Giudici, University of Pavia, 27100 Pavia
In this talk we shall present some statistical methodology to forecast television shares; in particular we shall consider linear regression models and MANOVA models. Our target is to select the model that minimise the error of prediction of the television shares. In order to do this we shall apply different methodologies.
The analysis will be performed by using S-Plus and PRONEL BN Learner software.
Keywords: Share, Association Rules, Bayesian Networks, Analysis of Correspondence

84. Data Mining Technology and its Application in Service Industry

Dr H. Santhanam, Department of Management Studies, University of Mumbai, India
Business Decisions are significantly improved by taking full advantage of Data Ware Housing/Data Mining Technologies. The analysts of information collecting, scheduling the data and discovering the information hidden in the data provides timely actionable information. Data Ware House/Data Mining can also be a cost-effective tool. This paper focuses on usage in the various industries and emphasises on Healthcare Industry such as health care disease management clinical decisions, financial and regulatory issues.
It also discusses the obstacles in the implementation. It tries to focus on the process and cultural changes that are needed to accompany the initiative of this technology at the Indian Industries Health Care Organisations. It suggests methodologies in designing to make it cost-effective tool for Business analysis and Decision Support.
A few case studies of the Marketing/Financial/Health Care in Organisations in India are discussed.

87. Data Mining Applications in Supply Chain Management

Selwyn Piramuthu, Decision and Information Sciences, University of Florida, Gainesville, Florida 32611-7169, USA
Supply Chain Management has gained renewed interest among researchers in recent years. This is primarily due to the availability of timely information across the various echelons of supply chain as well as the individual players within any given echelon in
these networks. In the ideal case, a supply chain facilitates the availability of the right amount of the right product at the right price at the right place with minimal inventory across the network.
Data mining and knowledge discovery methods have been used in disparate domains to extract useful information from raw data. Although information plays a major role in effective functioning of supply chain networks, there is a paucity of studies that deal specifically with the dynamics of supply chain and how data collected in these systems can be used to improve their performance.
In this presentation, we provide an overview of previous work applying data mining/knowledge discovery methods in supply chain management. We also explore other possible applications of these methods, specifically for automated supply chain
configuration as well as for validating node sequences in supply chains. Using examples, we illustrate the proposed methodology.

89. CRM and Datamining Application in Public Health

C.Voci and G.Cavrini, University of Bologna, Italy
The main target of this work was to apply industrial CRM analysis like segmentation or customer profiling in public health. The main differences go was finding the 'customer' thought not as consumer of goods but as the person who can decide the product to be buy. We, first, tried to translating all the subjects used in CRM in public health: General Practices are the dealers and they can decide the drugs to be buy by the patient. This is the final customer that use the product but he can't choose the medicine by himself. GPs have the faculty to prescribe drugs to the patient by selecting one from different offered by the market. For these reason we focus all the analysis on GPs but we have to consider that hospital can push on theme to save money in pharmaceutical goods or pharmaceutical firm can decide who is the best GPs to be use for new products or kind of it. We collects from GPs two years of prescriptions and we calculate from theme several indicators. Using different kind of software we found different clusters identified by different property. We assigned for every GPs a score based on the probability on which he can be the best GP to push in order to save money by the hospital point of view. Using predictive analysis and simulation pharmaceutical firm can define which GP who prefer to use new products more expensive instead of cheaper one. This is the key to save money in advertisements. Even if we are waiting for results we think that all the tools used in industrial CRM can be applied in Public Health.

94. Data Warehousing and Data Marts: Successful Data Mining also includes using Data on the right level of aggregation an transformation

Andrea Ahlemeyer-Stubbe, Database Management, Hauptstr. 34, D-77723 Gengenbach, Germany

Most companies have plenty of Data but very little Information. That's why you must place your Data into a structure and context that actually answers your business questions. Doing this allows you to convert customer's multiple touchpoint Data into the information you need for successful Data Mining. During the Data Mining Process Data Preparation use between 60% and 95% of time and resources. How the Data is aggregated and transformed has a great influence on the possible Data Mining Results. This talk gives you an overview on basic Data Warehouse and Data Mart technologies. You learn about typical aggregations and transformation to support different Data Mining Methods.

After graduating in statistics in 1993 Mrs. Ahlemeyer-Stubbe worked as a database marketing Specialist for several mail-order companies. Where she gained extensive expertise in customer scoring and modelling and where she led several database marketing and data warehousing projects.

Since1999 she founded her company Database Management, with this she is involved in consulting for CRM, database marketing and data warehousing. In 2000 she started also teaching CRM and database marketing at the Fachhochschule Offenburg and ECDM-Akademie.

As Vice President of ECDM, Mrs. Ahlemeyer-Stubbe is responsible for education and research and supports educational programs throughout Europe. EDCM is the first independent European Forum dedicated to the professional use of Database-and Relationship Marketing.

Since December 2000 she is co-chairing the working group Data Mining of ENBIS. ENBIS is the European network for statisticians in business and industries

99. Data Mining Per La Previsione Delle Valanghe Di Neve

Gabriele Stoppa, Paolo Scotton, Davide Fraccaroli and Paolo Cangiano

The stableness of stratum of snow depends on the complexity of breakage mechanisms bound to a relevant number of country variables. The works begin from the archives of 32 snow meteorological stations working in Trento province.

The analysis allows the employment of previsive models about snow events besides a harvest of support and careful study informations.

In particular we face the following principal points:
* Operativity and efficiency of used country sensoria;
* Typology, number and mole, besides the exposition and altitude of slope of separation avalanches;
* Characterisation of west and ovest sities;
* Interdependency between characters involved in stability and instability conditions;
* Efficiency and power of suggested models.

101. Business modelling in Web Mining Field

Paolo Mariani, Mario Mezzanzanica and Flavio Verrecchia, Facoltà di Scienze Statistiche, Università degli Studi di Milano Bicocca, 20126 Milano

Technological change acceleration, markets globalisation, loss of natural demographic exchange and national State crisis are producing such structural changes that introduce difficulties for economical analysis models formalised in the industrial economy development age.
With the web economy the customer has a much wider market supply and it changes without space-time limits (the nearness of a provider is no more enough to choose, because the whole world is reachable with a simple click). Besides the quality of supplied products/services, for being competitive today it is necessary to have a site and to control its quality level: short times for answer, simple finding information and interesting products, security transactions and high degree of privacy protection.
Independently from the aim, most organisations measure the success of on-line initiatives by means of the Web server traffic. The number of page impressions or the number of "clicks" are the most popular instrument to measure success. But log entry is not enough to obtain the customer profiling, in fact we need: dbases integration and synchronization from all channels, statistical methodologies supplying coherent processes, clear business strategies, webhousing and webmining SW, besides internet platforms able to support web-centric transformation of traditional business processes. Data mining means having to organize an effective team based on varied resources and company goals involving different departments: information technology; marketing; sales. An effective data mining activity means making use of different human resources: a company business expert; a data and procedures expert; a quantitative methods expert in data analysis. Data mining methodology will be described in terms of a hierarchical process model, using task description.

Six Sigma and quality improvement

21. Quality Improvement from the Viewpoint of Statistical Method

Jeroen de Mast, Institute for Business and Industrial Statistics of the University of Amsterdam

In the course of the twentieth century statistical methods have come to play an ever more important role in quality improvement in industry. For this purpose, statistical methods were made operational in the form of statistical improvement strategies, such as Taguchi's methodology, the Shainin System and the Six Sigma programme. In his recent PhD-thesis the author has proposed a rational reconstruction ('methodological framework') of statistical improvement strategies. The research followed the approach that is common in so-called 'reconstruction research'.

In this talk it is discussed how a scientific study can be set-up for a methodological topic such as the above mentioned one. The talk will discuss a definition and demarcation of 'statistical improvement strategy'. Furthermore, it proposes a research method that can be followed for this type of investigations. Also, the material and the literature that could be used are addressed. The talk concludes with a short overview of the results that were obtained.

24. Enhancement of Six Sigma in the Light of Culture, Knowledge and Strategy

C.Zielowski, Mining University of Leoben, Austria, Dr Jöbstl, Successfactory Management Coaching GMBH and Dr Gamweger, Elogics Services GMBH

Six Sigma is a well known business concept focused on cost reduction through process improvement. It mainly includes tool-kit deployment related to Quality Management, Black Belt concept and Zero Defect philosophy. But it lacks the consideration of certain important elements such as organisational culture, knowledge and strategy. An effort is made, to enhance classical elements of Six Sigma in order to achieve long-term business success.

In this model Quality Management tools are systematically embedded into the steps of DMAIC cycle. Additional to that the theory of Knowledge Management is incorporated in order to improve knowledge agglomeration and the loop of Organisational Learning. Since the consequences of implementations of new management systems are known for their impact to the culture of companies, this aspect needs detail discussion. The implementation of proposed model requires detailed consideration for integration in the strategic system. It also connects Six Sigma with strategy, politics and management information system.
Two case studies are presented to demonstrate the potential of the said model. The first report describes the implementation of Six Sigma concept, with new enhancements, in administration and focus on cultural training of the employees. The second example elaborates a manufacturing unit concentrating on the efficient combination of creativity tools (including Seven New Tools) and statistical methods.

25. Focused Self Assessment: Diagnosing, Prioritising and Improvement

Dr Jens J. Dahlgaard, Professor Lars Nilsson, Linkoping University, Sweden

The authors present and discuss a new type of questionnaire to be used for what we call "Focused Self-Assessment". The questionnaire has been designed in co-operation with a big service company (Post Denmark) with almost 30,000 employees. The company has used the questionnaire two times - first time in 1997 - when it was decided to start up a company-wide quality improvement process - and second time in 1999, when the process had lasted for 2 years. The questionnaire was designed to follow EFQM's Excellence Model. The idea of Focused Self-Assessment is to focus the self-assessment process on the vital few instead of the trivial many opportunities for improvement. Some results by using the questionnaire will be presented. The survey approach to self-assessment is described and data from a benchmarking study of European post offices is used to make a comparison between different estimation procedures for important weights.

31. State of Quality in Small and Medium-Sized Companies as a Previous Step on Six Sigma Implementation

Itziar Ricondo and Elizabeth Viles, Tecnun School of Engineering, University of Navarra, Spain

This report is part of a study that is being carried on by the University of Navarra, with the aim of analysing the feasibility of application of the Six Sigma methodology in surrounding organisations, mainly small and medium-sized companies, with and advanced quality culture.

Although Six Sigma can be implemented, theoretically, in any kind of company, the most known and successful stories have been reported from large companies (Motorola, GE, Allied Signal). These companies have spent big amounts of money on training, but have exceeded their investment achieving significant savings on large-scale projects.

Most of the enterprises in our region are small or medium-sized firms. Can Six Sigma be applied to them? Of course, people can be trained in the philosophy and statistical tools. But, can Six Sigma be profitably applied to these small companies, achieving breakthrough results? Profitability, one of the Six Sigma's milestones, is harder to get in small companies. Firstly, managers are reluctant to invest in such expensive training courses. Secondly, they can't afford Black Belts to spend 100% of their time on improvement projects. Finally, as improvement projects are of less size, potential savings are also smaller.

Which measures and statistical tools are known in small companies? Which is their level of quality? Which aspects of quality management are covered and where are opportunities for improvement? We have carried out a first survey among the companies of the region, in order to answer to these questions and know precisely which is their state in the way to business excellence. This information will shed light on further research in Six Sigma training, which we think is important not only for our region, but also for other regions in Europe where small companies play an important role in the economy.

71. European Experiences with Implementing Six Sigma - It Does Take Much More than Statistical Thinking

Ane Storm Liedecke, GE Capital IT Solutions, Denmark

Abstract:\ General Electric is known as one of the companies gaining tremendously from the Six Sigma methodolgy since the implementation started in 1995/96. This journey couldn't have been that successful without having a company prepared from ongoing changes. GE has been working with productivity, BPR, change management, etc. before implementing Six Sigma. This gave a unique opportunity for building up a management approach around "Six Sigma - The way We work". But without shared values and a strong belief in a strong organisational culture, the Six Sigma implementation would have been a difficult task to fulfill.

This presentation is based on experiences with implementing Six Sigma over the last 4 years starting in a Danish part of GE, supporting other European parts of GE and later on implementing Six Sigma for the European company Viterra Energy Services in Denmark and The Netherlands. From the experience it is obvious that a visible and strong involvement from top management is a "must be". Apart from this it also takes a clear organisational infrastructure built around the projects, an awareness of a time and resource consuming process for the first year or so - and a sustained focus on presenting and celebrating successes.

This is just some of the "lessons learned" to be presented; lessons seen both from a "helicopter perspective" and gained from working experiences as "a lonely Black Belt" trying to implement Six Sigma.

91. Six Sigma in the Context of Logistic Processes

Dirk Lehnick and Claudia Uhlenbrock, University of Göttingen, Germany
Six sigma methodology is concerned with utilising statistical tools to define, measure and analyse the performance of processes with the objective of improving them. It is assumed that the process under consideration is such that its performance can be quantified in terms of one or more attributes, and that these depend on a number of factors that can be controlled. In particular Six Sigma focuses on reducing deviations in the process because it is these that generally lead to deficiencies in quality, avoidable costs and customer dissatisfaction. So, the goal is to arrange the process so that it becomes almost free of non-random deviations; one uses the relevant control factors to reduce avoidable variation as well as to calibrate the process to "ideal" levels.

This paper is concerned with the application of Six Sigma methodology in the context of logistic processes. The first issue considered is the identification of attributes or indicators that can be used to quantify performance in this field of interest. Here one needs to investigate whether apparently plausible performance measures such as delivery time, deliveries on-time performance, proportion of correct supplies, customer satisfaction, logistic costs are appropriate. The problem here is to determine which measures really quantify the important aspects of performance. One also needs to identify mechanisms/variables that can be used to change/tune the process. We will give examples to illustrate how - by means of statistical tools - the relevant measures and control factors can be determined in logistic processes and thus how such process can be improved. Included here are applications relating to inventory systems, transportation and delivery, all of which are key aspects, for example, in the context of e-commerce.

Workshops

WS1. The Beer Experiment

Combining Utility and Pleasure when Demonstrating Statistical Methods — 90 minute session to include using beer, glasses and other necessary equipment.
Frøydis Bjerke, MATFORSK, Norway.

When running workshops, lectures and training programs in Design of Experiments (DoE) and related topics, tutors often wish to provide "hands on" demonstrations that both illustrate the ideas in a pedagogic manner and also breaks up the monotony of teaching. Unfortunately, these good examples are few, like the helicopter experiment which is frequently used, because of it's simplicity and because it can be used to elaborate on more sophisticated statistical matters (Box and Liu 1999).
At MATFORSK we have run several courses and seminars where DoE has been one of the topics, during the last years. The Beer Experiment described in this article has been run for different audiences, and has always been greatly appreciated. This versatile experiment can be used both as a first introduction to statistical methods for experimental work, and as a part of more advanced teaching of two-level factorial designs. The different stages of the experiment workshop can be emphasised to a varying degree, depending on context and audience. For the sake of completeness, all stages are described in detail below. If the workshop is given as a part of a Six Sigma - or SPC - training, more emphasis can be put on the early stages of the experiment.
The objective of the experiment is to study how frothing in beer poured in a glass is affected by beer sort, glass shape and glass tilting. This is done by using a two-level factorial design in three factors, a 2^3-design. Main effects and interactions are observed. Methods like brainstorming, cause and effect diagram and graphing of data can also be integrated into this workshop experiment, in order to illustrate statistcal methods for the entire scientific process.

WS2. Comparing Traditional DoE Set-Up with Taguchi's: A Case Study in Furniture Industry

. Co-ordinators Zoroica A.Veljkovic and Slobdan Radojevic, Faculty of Mechanical Engineering, Belgrade University.

Poster sessions

PO1. Tree-Based Automatic Classification of Digital Documents

Carla Brambilla, Istituto Di Matematica Applicata e Tecnologie Informatiche (IMATI), Italy, Claudio Cusano, Raimondo Schettini, Istituto per le Tecnologie delle Costruzioni

The automatic classification of digital documents is an unresolved challenge in multimedia and imaging communities. Being able to automatically properly classify images would allow indeed the unsupervised optimization of image processing strategies. This is particularly the case of cross-media color reproduction, where recognizing the class to which a processed image is likely to belong would allow the Color Management System to automatically perform color adjustments.
In this work we address the problem of distinguishing among photographs, graphical images, texts and compound images on the basis of low-level features such as color, edge distribution and image composition. By compound images we mean images that contain homogeneous regions of different types. Since the great variety and complexity of compound images makes very difficult to build a training set for the compound class, we discarded this approach and defined a hierarchical strategy which first classify an image as compound or not-compound by verifying the homogeneity of its sub-images and, afterwards, classify not-compound images as photographs, graphical images or texts. The 3-class classifier used in both steps of the strategy is a classifier derived by combining through a majority vote different tree classifiers built and validated according to the CART methodology. The biggest advantage of using the tree approach is that it provides a very clear characterization of the conditions which determine when an image belongs to one class rather than to another, thereby detecting the most discriminant features for the problem addressed and unmasking redundancy.
The results were reasonably good: we achieved indeed an average classification accuracy of more than 90%. The photo class used in the experiments includes photographs of indoor and outdoor scenes, people and objects, the graphic class banners, logos, tables, maps, sketches and photo-realistic graphics and the text class colored and black/white texts, in various fonts. Finally, as examples of compound images we used pages of newspapers, newsletters, web pages and so on.
The work was performed as a part of a ST Microelectronics' research contract.

PO2. The Euro and the convergence criteria: How to find structural changes

Wofgang Polasek, University of Basel and Bolzano

The introduction of the Euro has imposed rigid constraints on the 11 countries of the European union who have introduced the common currency on Jan. 1st, 1999. How important have these constraints been for the countries before and after 1999? Econometric methods for testing structural changes are explored and checked if they can be used to model convergence. We are fitting piecewise regression models and splines to the time series and we are testing for the existence of a possible break point. Also we look if differences in the volatility of the growth rates can be detected. We use information criteria and a Bayesian approach through marginal likelihoods for model selection.

PO3. The Treatment of Ordinal Data for Customer Satisfaction Measure

Annarita Roscino and Alessio Pollice, Dipartimento di Scienze Statistiche, Universita degli studi di Bari, Italy

In many marketing researches, as for the measure of customer satisfaction, data are collected through questionnaires. Responses are often classified into ordered categories, so observed variables are ordinal and the rate of missing data may be very high. Generally, the missing observations are deleted and the data are supposed continuos in order to apply classical statistical methodologies (such as the Pearson correlation coefficient, Normal theory- based Maximum likelihood estimation, etc).
In this paper a method for the analysis of a categorical and incomplete data matrix is proposed. Our methodology is applied to data collected by a market survey of Fiat Auto in order to show the latent variables underlying the customer satisfaction with the dealer from whom the new car was bought.
As a methodological assumption, ordinal variables are regarded as measurements of unobservable continuous variables. So, after multiple imputation of missing values the polychoric correlation matrix, that measures the latent variables correlations, is computed. This kind of correlation matrix can be used as a proper input for standard analyses implemented in every common software.
In this application, the latent factors of the original data set are first calculated and weights of the obtained continuous factors on the ordinal variable global judgement are then estimated by a logistic regression.

PO4. Christine Burdon, University of Newcastle upon Tyne

"Business to Business Mentoring- Northumberland Cheese Company"

17. A Nonparametric Regression Model with Bootstrap Confidence Intervals: Some Applications in Industrial and Clinical Field

Alessandra Durio and Ennio Davide Isaia, Department of Statistics & Mathematics, University of Torino

In this paper we discuss the different behaviour between classical linear regression and nonparametric kernel regression methods.

In particularly, we shall focus our attention to all those cases were small or moderate variations of the covariate involve great consequences in explaining the phenomena under study; in all these situations nonparametric kernel regression seems to be a more efficacious method, which in turn may lead to an appropriate choice of the parametric model.

Furthermore, with the aim to find a local confidence interval for the mean of the estimator, which is biased, we shall use bootstrap techniques in order to approximate the law of statistics. In particularly we shall construct variability bands for the regression mean and compare them with classical confidence intervals; this will be achieved bootstrapping the kernel regression's residuals of simulated data.

We shall provide several examples based on real data arising from industrial and clinical fixed design experiments and conclude with open research questions.

26. An Application of Algebraic Geometry in Combinatorial Chemistry

Eva Riccomagno and Elwin de Wolf, University of Warwick, UK
This presentation gives the conclusive analysis of a study presented at the first ENBIS conference, Oslo September 17-18, 2001. It refers to the design and analysis of a fraction of a 3^3x4 full factorial design. Each combination is one compound used as a catalyst. The aim is to optimise the recycling of the catalyst. Interactions are relevant.
Groups of size 5 to 10 of similar compounds were made at one time, thus creating an aliasing effect. After the first group of compounds was measured, it was clear that a regular fraction of the full factorial design would not have been appropriate. Algebraic geometry helped in the choice of each group by allowing to determine a saturated set of terms to be included in an identifiable linear regression model for a given group, prior to making the compound. It also gave information on the aliasing and confounding structure. Then, together with standard least square estimation, it is used to analyse the results. In particular we found convenient to split the obtained data set according to the levels of a factor and carried out a different linear regression analysis.
This kind of analysis, standard in statistics, is novel and promising in combinatorial chemistry.
This is joint work with Elwin de Wolf.

35. A Case Study in Applying Statistics in Human Resource Management-: Predicting Whether Trainees Will Succeed While Shortening Induction Times

D.J.Stewardson and S.Y.Coleman, Industrial Statistics Research Unit, University of Newcastle upon Tyne, UK

This case study examines the application of statistical techniques to the problem of training and assessing the capability of new clerical staff. The study involves people whose main job was to price-up drugs and other health aids supplied to the general public by the National Health Service. Their work involved learning how to apply hundreds of rules and regulations, that tended to change over time, and pricing instructions that also changed regularly. Trainees were typically assessed, before this project, by 100% checking of all output, over an induction period of up to 52 weeks, due to the complexity of the tasks and the high performance standards required. Successful trainees had to achieve a minimum average daily rate of 3000 completed transactions at an accuracy rate better than 97.5%. If these requirements had not been met within 52 weeks, the trainee was dismissed. The project described here shows how the organisation was persuaded to cut the trainee work sampling rate to 10% of output and the time-to-final-decision to 20 weeks, with the majority of cases being decided within 15 weeks. The tools used included a decision algorithm incorporating rules based on changes in linear regression slopes and a rule based on the, fairly rare, use of a linear discriminate function.

36. MSEP Estimates for Principal Components and Partial Least Squares Regression

Dr Bjorn-Helge Mevik, Matforsk, The Norweigan Food Research Institute, Norway
The mean squared error of prediction (MSEP) of future observations is frequently used to assess the performance of regressions. It is also used to choose the optimal number of components in principal components regression (PCR) or partial least squares regressions (PLSR).
The MSEP can be estimated by applying the regression to an independent test set. For several reasons, a (big enough) test set is not always available. In such situations, the MSEP has to be estimated from the learning data.
The leave-one-out cross-validation is perhaps the most widely used internal estimate. It is nearly unbiased, and easy to implement and understand. It has however been criticised for being variable, and therefore not consistent when used for model selection. Alternative estimates, usually based on k-fold cross-validation or the bootstrap, has been proposed to reduce this variability.
Most theoretical results regarding the properties of these estimates have been developed under the assumption that the number of variables is small compared to the number of observations. Also, most empirical comparisons have been performed with such data.
It is not obvious whether these results are valid in situations where PCR and PLSR are used: when there are more variables than observations.
The talk will present results from simulation and real data studies, comparing several competing MSEP estimates: leave-one-out cross-validation, k-fold and adjusted k-fold cross-validation, and a number of bootstrap estimates.
The results indicate that some of the alternative estimates can have less variance than the leave-one-out cross-validation, without gaining too much bias. The adjusted k-fold cross-validations can also be good candidates, merely because of their computational efficiency.

48. Design of Experiments for Static Influences on Harmonic Processes

Winifried Theis, Collaborative Research Center, Universitat Dortmund, Germany
In mechanics it happens that some fixed influencing factors determine the nature of a harmonic process. This can be modelled by regression of the periodogram ordinates of the relevant frequencies. Thereby the time-domain is bypassed, and a static models can be applied. Since it is known that periodogram ordinates are (non-central) chi-squared distributed, when the noise process is gaussian, it seems to be natural to tackle the problem with generalised linear models. But in the case of harmonic processes the ordinates at the relevant frequencies typically show large non-centrality parameters and therefore a normal approximation may be an alternative.
Prior information about the error distribution, parameter estimates and the link function is needed to construct an optimal experimental design for a generalised linear model. Therefore it is of interest to assess the loss realised by using a normality assumption in the construction of the experimental design. This possible loss is investigated in a simulation study. The experimental design of the simulation study itself is chosen to span a wide range of possible situations. When the signal-to-noise ratio is not too small for some of the design points the normality assumption seems to be appropriate.

54. The Euro and the Convergence Criteria: How to Find Structural Changes

Wolfgang Polasek, University Basel and Bolzano, Italy

The introduction of the Euro has imposed rigid constraints on the 11 countries of the European union who have introduced the common currency on Jan. 1st, 1999. How important have these contraints been for the countries before and after 1999? Econometric methods for testing structural changes are explored and checked if they can be used to model convergence. We are fitting piecewise regression models and splines to the time series and we are testing for the existence of a possible break point. Also we look if differences in the volatility of the growth rates can be detected. We use information criteria and a Bayesian approach through marginal likelihoods for model selection.

Keywords: Structural breaks, piecewise regression, convergence, model selection.

61. Predicting Digester Failure and Identifying Latent Fault Causes using Multivariate Technique and Bayesian Analysis

Alexander Chakhunashvili, PhD Student, Chalmers University of Technology, Gothenburg, Sweden, Lasse Lasse Hübinette, Lic. Sc, Nordic Financial Systems/Chalmers University of Technology, Gothenburg, Sweden, Bo Bergman, SKF Professor, Chalmers University of Technology, Gothenburg, Sweden

When digesting tree chips into pulp various kinds of faults resulting in digester failure may be encountered. One such fault, called hang-up, occurs when pulp stops stirring downwards in the digester because of certain latent causes. A large number of variables are measured and data concerning hang-up and other kinds of faults are collected. A problem in focus is how to predict digester failure and make an accurate fault diagnosis.

In our approach to address this problem, we divide the data into two different sets collected from the normal state of the process and the anomalous state of the process, respectively. After standardising the variables and normalising the anomalous data set, where the response variable is the statistical distance from the normal state of the process to anomaly, a principal component analysis is conducted to reduce the number of variables. To discover when the state of the process is approaching anomaly a Bayesian updating scheme will be utilised, which should also pinpoint the latent fault causes of the process upset.

69. Non-Symmetric Multivariate Monitoring of Time Correlated Data

Alessandro Fasso, University of Bergamo, Italy and S.Locatelli, Brembo, Spa, Italy
Standard multivariate control charts use quadratic forms based on the multivariate Gaussian distribution. These charts have the same power for process shifts along the Gaussian elliptic contours in any direction.
In many applications quality deteriorate as long as one or more measures increase. For example, in braking disk manufacturing, geometrical deformation and disk thickness variation have to be kept small. Other concurrently measured quantities possibly correlated may have symmetrical tolerance limits.
One-sided multivariate MEWMA control charts have been introduced both for environmental monitoring and industrial quality control (see e.g. Fasso 1998 and 1999). In this paper we extend this approach to cope simultaneously with symmetrical and non-symmetrical specifications.
Moreover, two applications are discussed. The first one is based on brake disks productions testing. The second one is related to environmental monitoring and is concerned with pariculate space-time monitoring and alerting.

74. 18 Months to Make an Analytical CRM Dream Come True:

Omnitel-Vodafone in B2B churn prevention, (F.Cavallo, A.Bernorio - Omnitel Vodafone, G.Cuzzocrea, P.Bauce - Nunatac)

Implementing Analytical CRM, are you ready to go? You need money, commitment, technology, right skills and management . . . ., just like John Belushi had sun glasses, half a tank of gas and above all a strong and determined will to go.
Are you worried about the monsters you will encounter, before getting a possible return on your investment? We would be glad to share what the following points actually mean:
* Allow the IT resources the necessary time to prepare the infrastructure for "the big project", but do not wait until the CRM environment is available;
* transform your business goal into a non trivial analytical design;
* build a predictive model for churn prevention, considering the future data production process, the continuously changing characteristics of the offer, the timing and details of the customer caring activity,
* implement in a new production environment a data mining process, developed off-line;
* do not reach the paradise, but achieve a reasonable quality standard to start with successful prevention campaigns.

75. KALEIDOS: A Splash of Colour for the "Silvery Haired"

Guido Cuzzocrea and Salvina Piccarreta - NUNATAC, Italy and Linda Pimpinelli, Ignazia Vitali - CONSODATA, Italy

Where did those people heading "for the top" actually get to? Have the "career singles" decided to settle down? Finally, are the "movers, shakers and VIPs" still in their respective positions? Nearly 10 years after the ISTAT census and nearly five after the SEAT Geo-demographic Cluster Segmentation, based on these data, were put on the market, Giallo Dat@ has carried out a project to refresh the colouring of the 320,000 Italian census micro zones.

The available data sources used to reach this ambitious objective are heterogeneous in terms of detail. They also cover diverse information categories, and for comparison purposes, a number of extremely consistent samples have been utilised, with, however, no overlapping.

This paper presents the results of the project, it describes the methodology used which enabled us to successfully use a wide range of sources and to construct a classification process which was at the same time sound and rich with information: SAS users can follow a procedure which leads to an optimal utilisation of the integration potential and the completeness of the available analysis techniques in the system. For marketing users, surprises abound, a splash of colour for the "silvery haired".

77. Forewarned is Forearmed: Preventing Defections of Our Best Customers

Fabio Marchetti and Filippo Avigo - BIPOP CARIRE and Alberto Saccardi and Salvina Piccarreta - NUNATAC, Italy

Bipop-Carire is one of Italy's most active financial groups in asset management and on-line banking. In the spring of 2000, under the supervision of the Marketing Division, Bipop-Carire started its CRM project.

In September 2000, in cooperation with Nunatac, a company focused on building decision support systems, took place the start-up of the activities concerning the development of Data Mining environments, analysis models and tools to interface with branches and financial planners during marketing campaigns.

Our presentation will illustrate one of the most significant actions adopted in the winter of 2001-2002 aimed at preventing customer defections. Specifically, the operational process we implemented included the following steps:
* Development of a churn model to identify the customers most likely to defect;
* Creation of a model to single out, based on their profile, customers who may be targeted as likely buyers of insurance products, which have proved to reinforce customer loyalty;
* Customer assessment based on their current and potential values;
* Development of a three-dimension matrix - the three dimensions being the three measures generated by the models above - to identify marketing actions targeted to preventing best customers defections.
Enterprise Miner allowed us to accomplish the project in few months. SAS AF was the front end application we used as the tool for communication to branches and financial planners and for easy and immediate reporting of results.

78. TITOSIM Project ... for a New Supporting System Along the Whole Product Development Process

Cristina Randazzo , Nicoletta Francone (CRF, Italy)

There is a strong demand for methodologies and tools to help European industries to improve their competitive position, reducing the time to market and increasing the quality of the final product. Competition is not the only aspect that must be taken into account. The capacity of the products and of the processes that are used to produce them to meet the increasingly stringent regulations in terms of environmental impact and safety are examples of other very relevant issues that industry faces. In this field came up the European Project TITOSIM (GRD1 - 2000 - 25724).

The basic idea of this project is to simultaneously find a design solution for both the product and the related processes which is optimised, fast and robust, taking into account the needs of the costumer. Optimisation is attained via an intelligent exploration of the set of possible solutions, fastness via the substitution of the physical or simulated systems with consistent meta-models, robustness through the consideration of all possible sources of variability along the product development process. For these aspects, the project TITOSIM extends to the whole Product Development Process a successful approach to accelerating simulations developed during the previous European Project (CE)2,"Computer Experiments of Concurrent Engineering", while the new functionalities will cover different types of experiments, the capturing of subjective belief, robust design, sensitivity analysis and multi-objective optimisation.

Then, summarising, the main objectives of this project are :
* to implement a software platform that integrates and develops statistical methodologies in product development;
* to support the user in the definition of product objectives through the analysis of the Voice of the Customer,
* to develop a software prototype able to deal with simulators/physical data coming from both design and production,
* to measure and guard against variability of input factors at the production stage by optimisation at the design stage.

83. Model-Robust and Model-Sensitive Design Strategies


Peter Goos, André Kobilinsky, Timothy O'Brien and M. Vandebroek, K.U. Leuven, Belgium

The assumption that underlies most research work in experimental design is that the proposed model adequately describes the response of interest. It is unlikely, however, that the experimenter is completely certain that any model will be adequate, and this should be reflected in the experimental design. Thus, instead of looking for the optimal design to estimate the stated model, model-robust and model-sensitive designs have been proposed in the literature to account for model uncertainty; see for example Steinberg and Hunter (1984).

In the model-robust approach, one looks for designs that minimize bias and which therefore yield reasonable results for the proposed model even if the true model is different from the one estimated. The pioneering work in this area is Box and Draper (1959), which introduces a designs strategy that minimizes the average mean squared error over the region of interest assuming the true model is composed of a primary model - the one that will eventually be estimated - plus potential terms. Their criterion can be decomposed into the sum of a bias component and a variance component. Recently, Kobilinsky (1998) developed a model-robust design criterion using a Bayesian approach. By introducing a prior distribution for the potential terms, he was able to develop a more general design criterion that can be used to get computer-generated designs that possess good bias and variance properties.

Model-sensitive design approaches, on the other hand, provide designs that facilitate improvement of the model by detecting lack-of-fit. O'Brien (1994), building on the work of Atkinson and Donev (1992), develops a criterion to find designs that are well suited to fit the primary model and which enable one to test for lack-of-fit in the direction of the potential terms. Further, DuMouchel and Jones (1994) used a Bayesian approach to obtain designs that are less sensitive to the model assumption. Although the latter authors claim that their criterion leads to designs that are more resistant to the bias caused by the potential terms, our research indeed suggests that their designs resemble those of O'Brien (1994) in focusing more on the variance and the detection of lack-of-fit than on the bias.

We develop a new criterion that takes into account both model-robust and model-sensitive aspects, and thus our new criterion combines the efficiency in estimating the primary terms, assures protection against bias caused by the potential terms and enables to test for lack-of-fit and as such increases the knowledge on the true model. The resulting designs perform well when evaluated using several criteria, and several examples are provided to illustrate the new criterion and its performance.

85. Modelling the Stability of Commercial Enzyme Formulations

Jesper Frickmann
Enzymes are biological molecules, and they degrade rather quickly. Therefore, formulations of commercial enzyme products must stabilise and preserve the enzymes, and one of the key quality parameters is thus product stability. At Novozymes A/S, product batches are routinely sampled for stability studies, in order to provide a solid empirical base for evaluating product stability. The data collected is evaluated with a non-linear model, postulating exponential decay over time, with a decay rate increasing exponentially with temperature (Arrhenius' law). Since samples are stored for varying periods of time, not all measurements of enzyme activity are made on the same day, and
correlation between measurements from the same day must be accounted for. This is handled by combining non-linear curve fit with the REML optimality function known from linear mixed models. The model is implemented as a MS Excel spreadsheet, retrieving data from and storing results in an Oracle database via ODBC.

86. A Spatial Model for Detecting Changes in Paper Production

Patrick Brown, Lancaster University, UK
This talk addresses the problem of detecting changes in the production of paper using time series data of paper opacity. For every 14 seconds of data a statistical model is fit and estimates of six parameters are produced. These six values serve to summarise the features of the segment of data, and by tracking the values over time the behaviour of the manufacturing process can be evaluated. A certain amount of variability in the parameters is expected, even when the process is in control, so a time series model for the parameters if formulated and detecting departures from this null model becomes the problem of interest.
The model used is related to shot noise processes and Poisson cluster processes, with parameters estimated using maximum likelihood in the frequency domain. Modelling "flocs", clusters of paper fibres, is the primary motivation of the model. A six parameter, additive model provides a good fit to the data and can provide parameter estimates and standard errors quickly even with large datasets.

88. Nonlinear Multivariate Analysis as a Support for the Internal Auditing

Activities at Monte dei Paschi di Siena Group
In this paper we try to show how multidimensional explorative techniques can help internal auditors to address the problem of selecting the dependencies to inspect and to understand where to focus monitoring activities.

96. Defect Reduction in an Automotive Components Company by the Use of Six Sigma

G.Arcidiacono, G. Campatelli and P. Citti, University of Florence, 50139, Italy

The new products require a high quality standard, so every aspect of a company has to take in to account with a statistical approach to meet this goal. These are, in part, the basis of the Six Sigma approach: to evaluate every process using statistical tools.

In this work we'll explain how the Six Sigma approach could be applied in a automotive company and what kind of result it could grant. In particular we have treated the case of casting a plastic component (paraurti) for the automotive industry. This component presented a lot of defects which causes aren't so clear. So the company opened a Six Sigma Project with the aim to solve the problem reducing drastically the number of produced defects, developing step by step the DMAIC phases.

The project started using classical tools, as Pareto and Cause-Effect Diagram, to evaluate the classes of defects and the possible causes. Then statistical tools are used in order to reduce the defects that have been proven to be the most frequent. The tools used have been the ANOVA for evaluating if there was a correlation between the materials used and the frequency of defects and the DOE to find the best combination of all the casting parameters. Finally a quantification of the cost saving dues to the improved process is given to prove the real advantage given by the use of Six Sigma Project.

Software session

6. Screening Factors using Small, Optimal, Equireplicated, Resolution
IV Fractions of 2k Designs

Gary W.Oehlert, School of Statistics, University of Minnesota and Pat Whitcomb, Stat-Ease Inc
Paper withdrawn.

32. Multivariate and Multistrata Nonparametric Tests: The NPC Method

Livio Corain, Dipartimento di Tecnica e Gestione dei sistemi industriali - Università di Padova; stradella, S. Nicola, 3 - 36100 Vicenza; Fortunato Pesarin
Dipartimento di Scienze Statistiche - Università di Padova;via C. Battisti, 241 - 35121 Padova; Luigi Salmaso, Dipartimento di Scienze Statistiche - Università di Padova; via C. Battisti, 241 - 35121 Padova;

In many scientific disciplines and industrial fields researchers and practitioners are often faced with complex problems when dealing with comparisons between two or more groups using classical parametric methods although real problems rarely agree with the stringent assumptions required by such methods. The NPC methodology (Pesarin, 2001) frees the researcher from stringent assumptions of parametric methods and allows a more flexible analysis both in terms of specification of multivariate hypotheses and in terms of the nature of the variables involved in the analysis. One of the most relevant features of NPC Test is that it does not need a modelling for dependence among variables.

Considering a k-dimensional hypothesis testing problem, the NPC solution is processed in two phases: firstly, we define a suitable set of k unidimensional permutation tests called partial tests. Each partial test examines the marginal contribution of any single response variable in the comparison made between groups. The second phase is the nonparametric combination of dependent tests into one second order combined test, which is suitable for testing possible global differences between the multivariate distributions of two or more groups. When there is a stratification variable, we expect two combination levels: the partial tests combination in a set of second order combined tests, within the stratum, and a further combination of the tests in a single third order combined test.

In this paper we give a short outline of the implementation of NPC methodology and we present some real example case studies, where solutions are provided by the new and innovative statistical software NPC Test 2.0 (more details at www.methodologica.it) that completely implements the NPC methodology offering both flexibility and a user-friendly interface.

We believe that in many experimental and observational studies the NPC method together with the software will offer a significant contribution to successful research in business and industrial statistics.

33. The Fourpoint Method

Arnt Schoning, Schoning Data, Sweden

The title of the submission is: 'The Fourpoint Method". It is based on a computer program called 'Fourpoint'. The reason why I developed this program is that I have noticed that several projects are presenting the results by means of curves based on an unnecessarily high number of points. In my opinion it should in many cases suffice with four points (y/x-values) to present the results, at least as a preliminary information. In the program you can use one dependent variable (y) as a function of an independent varable (x), with single or duplicate y-values. You can also use two independent variables (x-and z-values).
The results are presented by means of curves based on 7 different mathematical equations of which you can select the one which fit the points in the best manner. The curves are those mostly used in chemical or biological applications. The significance of linear, quadratic and cubic components are given. For two independent variables the significance of these are given, this should also reveal some interactive effect. Maximum and minimum points of the curves are given if they exist. All input values can be changed during the run. The region on the ordinate (y-axis) can be selected during the run to give you the wanted presentation of the curves. The program will be presented by means of Power Point slides.

58. Dependency Networks and Bayesian Networks for Web Mining

E. Blanc and C. Tarantola, University of Pavia

Following the approach described by Heckerman et al. (2000), we present an application of Dependency networks and Bayesian Network to the analysis of a click-stream data set. Our target is to discover which paths are more often followed by the users. The relation between one web page and another one is represent by a direct graph. Whereas Bayesian networks are direct acyclic graph, Dependency networks may contain cyclic structure. The analysis will be performed by using the WinMine Toolkit software.

Keywords: Web Mining, Bayesian Networks, Dependency Networks, Click-stream Data Analysis, Log file.

59. Statistical Designs for Mixture Experiments using STAVEX

Ekkehard Glimm, AICOS Technologies AG, CH-4057 Basel, Switzerland
Experiments involving mixtures of ingredients pose a challenge to the statistician looking for appropriate experimental designs. The fact that the components of the mixture typically add up to 100% prevents orthogonal designs and makes it hard to distinguish between the individual mixture factor effects. More often than not, additional restrictions, such as limits on the ratio of mixture factors, further complicate the task. Of course, different questions call for different designs, for example, vertex-centroid designs are useful for screening, while optimisation might require D-optimal designs.
Mixture problems arise in various industrial applications, such as in dyestuff , tire and concrete production, in sustained-release drug formulation and many other fields. In the talk, an example from the pharmaceutical industry will be used to demonstrate how the experimental design software STAVEX handles these problems. In addition to the mixture factors, the example also includes non-mixture factors from a production process (such as stirrer speed, temperature).
With respect to the experimental designs, it will be demonstrated how one can move through different stages (from screening via modelling to optimisation) towards an optimum setting of the factors, ever refining one's questions on the basis of analysis results from the previous stage.
Regarding the analysis, the focus will be on the interpretation of effects estimates. These depend on a pre-specified reference mixture as well as on the way the restrictions of the mixture experiment are parameterised in the statistical model used to fit the data. The discussion will involve both main effects and interactions of the mixture and the process factors.