# ENBIS: European Network for Business and Industrial Statistics

Forgotten your password?

Not yet a member? Please register

# Submitted abstracts

**For the Fifth Annual ENBIS Conference**

*More information on this conference can be found on the events page.*

*In particular: see the conference programme*.

### Sessions

Poster presentations

1a. Data mining

1b. Statistical modelling in biostatistics

1c. Finance and risk management

2a. Workshop: Research methods in practice - opening up a toolbox

2b. Statistics education and training

2c. Best Manager and Young Statisticians Award and panel discussion

3a. Workshop: Reliability studies part 1

3b. Statistical modelling

3c. DoE general

3d. Statistical consulting

4a. Workshop: Reliability studies part 2

4b. Process modelling and multivariate methods

4c. DOE case studies

4d. Statistical consulting

5a. Workshop: Reliability studies part 3

5b. Process models and engineering process control

5c. DOE general

5d. Customer surveys and Kansei engineering

6a. Reliability, maintainability and safety case studies

6b. Statistical modelling

6c. DOE and multi-objective optimization

6d. Six Sigma

7a. Workhsop: Advanced fields of statistical modelling

7b. Measurement processes and capability

7c. Six Sigma

8a. Algebraic statistics

8b. Process models and testing

8c. Workshop: Wild River DoE Workshop

### Index by number

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 34. 63. 64. 36. 37. 38. 39. 40. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 66. 67. 68. 69. 70. 74. 75. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 95. 94. 96. 97. 98. 99. 100. 101. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115.

### Index by author

### 1a. Data mining

**29. Discover Patterns in Categorical Time Series using IFS**

*Authors:* Christian Weiß (University of Würzburg) and Rainer Göb (University of Würzburg) *Keywords:* Sequential pattern analysis, visual data mining, iterated function systems, categorical time series. *Format:* presentation (Data mining) *Contact:* christian.weiss@mathematik.uni-wuerzburg.de

The detection of meaningful sequential patterns in categorical time series is an important task in science and industry, e. g., in linguistics, biology, safety engineering, telecommunication network monitoring. Various algorithms for finding frequent sequential patterns have been developped in the KDD (knowledge discovery from databases) literature. The present paper presents an online-approach for sequential pattern analysis. The method is based on iterated function systems (IFS) which are usually used for generating fractals. By an IFS, the categorical time series is transformed in a manner that patterns can be analyzed with standard methods of cluster analysis. Particularly, a variant of the procedure allows to detect patterns visually.

**56. A comparison between two different methodologies to analyze Italian families expenditures: Association Rules and Canonical Correspondence Analysis**

*Authors:* SILVIA SALINI (UNIVERSITY OF MILAN) and PAOLA ANNONI *Keywords:* Family expenditures, Association Rules, Correspondence Analysis *Format:* presentation (Data mining) *Contact:* silvia.salini@unimi.it

The goal is to identify every possible pattern in Italian families expenditures on some major group of goods. Data collected by ISTAT (Italian Bureau of Statistics) during 2003 on a large sample of families are analyzed by two different data mining techniques: Association Rules and Canonical Correspondence Analysis. Association Rules are designed to find every possible association and/or correlation among large set of data. The method identifies attribute values that occur most frequently together in a given dataset and enables researchers to uncover hidden patterns in large data sets. A typical and widely-used example of Association Rules is the Market Basket Analysis, where data are generally organized in a large cross-tabulation, in which customers are the rows and expenditure typologies are the columns. The purpose of the analysis is to find associations among types of expenditures, i.e., to derive association rules that identify expenditure co-occurrences which appear with the greatest (co-)frequencies. These kind of analyses are mostly used for applications regarding large distribution and on-line businesses. The second method applied is a modified version of 'classical' Correspondence Analysis that is particularly suitable for extracting latent factors from large datasets. The method is known as Canonical Correspondence Analysis (CCA) and was formerly designed to analyze ecological data. It is used here for its capability to uncover the link between families expenditure patterns and some external variables, which are considered highly significant in explaining the family behaviour. The interesting feature of CCA is that these 'explanatory' variables are embedded within the expenditure analysis so as the technique can be regarded as a 'constrained' Correspondence Analysis. Comparisons between results by the two methods are finally due and discussed.

**100. Fuzzy Linguistic Regression**

*Authors:* Murat Alper Basaran (Hacettepe University) and Alper Basaran(Hacettepe University) Suleyman Gunay(Hacettepe University) *Keywords:* Fuzzy regression, regression *Format:* presentation (Statistical modelling) *Contact:* muratalper@yahoo.com

Productivity is one of the most important factors that companies evalute in today's competing world. Simply, productivity is a measure which calculates the ratio of output over input. This basic measure can be calcuated by quantitative data. Moreover, more complex methods are available in the literature, when quantitative data are available. Hovewer, there are other factors different than measurable ones that affect productivity. These attributes are determined linguisticaly by the experts or the executives. To employ these linguistic attributes, fuzzy set theory is a new tool to incorporate these kind of data into analysis. In our study, twenty leading textile company are chosen to get information affecting their productivity in terms of attributes which can be expressed linguistically. Based on this information, fuzzy linear regression is employed. In liteature, only linguistic input-output variables are employed.Hovewer, in our study, both crisp output - linguistic input and linguistic input - output variables are employed in fuzzy regression.The features of the results are applicable.

**114. Multivariate Models for Operational Risk Management**

*Authors:* Luciana Dalla Valle (University of Pavia) and Dean Fantazzini (University of Pavia), Paolo Giudici (University of Pavia) *Keywords:* Copulae, Bayesian Networks, Operational Risks, VaR *Format:* poster (Data mining) *Contact:* luciana.dallavalle@unimib.it

The management of Operational Risks has always been difficult due to the high number of variables to work with and their complex multivariate distribution. A Copula is used to build flexible joint distributions in order to model a high number of variables. A Bayesian Network is used to integrate, via Bayes' theorem, different sources of information, such as internal and external data. The goal of this paper is to propose the two approaches to model Operational Risks, by showing its benefits with an empirical example.

### 1b. Statistical modelling in biostatistics

**21. Hierarchical Longitudinal Modeling of Air-Pollution Health Effects**

*Authors:* Michael Friger (Ben-Gurion University of the Negev) and Arkady Bolotin (Ben-Gurion University); Ronit Peled (Ben-Gurion University) *Keywords:* air-pollution effects; statistical modeling; hierarchical structural model *Format:* presentation (Statistical modelling) *Contact:* friger@bgu.ac.il

In the paper, we present the strategy and methodology for constructing and epidemi-ologically reasonable interpretation of models describing air-pollution effects. The main assumption in our approach is that every health outcome is an element of the multivariate multilevel hierarchical system and depends on geophysical, meteoro-logical, pollution, socio-cultural, physiological, demographical, and other factors. We developed the 3-stage strategy for model building: from the hierarchical structural model to the formal mathematical model to the specific statistical model. We propose different options for the hierarchy realization. The final model is built on the method-ology of system analysis, Generalized Estimation Equations, Generalized Linear Models techniques, and time-series analysis. We developed so-called "multi-layer" approach to epidemiology meaningful interpretation of the models. This approach al-lows considering dynamic changes in air-pollution effects. The proposed methodology was applied for modeling health effect of air-pollution with the data collected from lung function measurements in the group of 115 asth-matic and healthy children in two cities (Ashqelon and Ashdod) in the period from February 2002 till September 2002 (5232 person-day records). The meteorological variables comprise daily maximum temperature, average humidity and barometric pressure, the air-pollution variables are Nox, So2, ozone, and suspended particulate matters (PM2.5 and PM10). In addition, lags of air-pollution affect up to 3 days were studied. Results of the models demonstrate the significant direct and indirect influence of PM and ozone concentration on the lung function. Additionally, we found 3-day-delay in effect of So2 on the lung function of asthmatic children of Ashdod.

**61. Data science at Unilevers Foods Research Centre**

*Authors:* Henk van der knaap (Unilever) *Keywords:* clinical trial, design of experiment, analysis of (co)variance *Format:* presentation (Design of experiments) *Contact:* Henk-van-der.Knaap@unilever.com

Unilever has a long and rich history of more than 50 years of performing foods and nutrition research. Key role of this research is to provide scientific evidence to support nutrition-health claims for Unilever products and to seek endorsement for this evidence from external experts and health organisations. In the markets that Unilever operates positions have already been established in the area of Heart Health and Weight Management. Recent innovations include the development and marketing of foods and beverages with health benefits for children, both in the developed and developing world. Typically, nutrition health claims are based on evidence from nutrition intervention trials with healthy consumers that are done to understand mechanisms of action. Statistics play an essential role in designing and analysing these trials. This presentation will give an overview of these human trials and will highlight the role of statistical consultancy in setting up these trials and communicating its results. Topics that will be covered are the design of the study, power analysis and issues related to the analysis of these results, like selecting relevant covariates.

**75. Modelling Hepatitis C Viral Load Pattern**

*Authors:* Jian Huang (University College Cork) and Kathleen O'Sullivan (Department of Statistics, University College Cork), John Levis (Department of Medicine, University College Cork), Elizabeth Kenny-Walsh (Department of Medicine, University College Cork), Liam J. Fanning (Department of Medicine, Univer *Keywords:* logistic model, hepatitis C viral load, mixed-effects model *Format:* presentation (Statistical modelling) *Contact:* kathleen.osullivan@ucc.ie

The hepatitis C virus (HCV) is a ribo-nucleic acid (RNA) virus that infects approximately 3% of the world's population. The viral load of the HCV can be measured in serum by RT-PCR (a method based on amplification of genomic RNA). To date several studies have attempted to elucidate the fluctuations in viral load in both acute and chronic diseases (e.g. HCV, HIV). Previously, we have shown that viral load does change over time in some patients and exhibits periods of apparent stability in others. In this paper we outline a statistical model which describes the viral load pattern of HCV over time. The data used consisted of 147 untreated patients chronically infected with hepatitis C, each contributing between 2 to 10 years of measurements. Virus genotype, gender, age and infection source were also recorded. As significant patient-to-patient variation was evident from the individual viral load patterns, a nonlinear mixed-effect approach was employed to fit a three parameter logistic model. By sequential modelling, the effects of genotype, gender, age and infection source were investigated. To assess goodness-of-fit, residual analysis was performed. The analysis showed that a three parameter logistic model provided a good fit for describing the viral load patterns collected with varying frequency over different time intervals. Results of the modelling indicated that individuals infected by virus of genotype 1 compared to genotype 2 and 3 had a significantly higher maximum viral load. Also, the average viral load growth rate was significantly different between infection sources.

**79. Generalized linear mixed models applied to an analysis of water for injection in sy-ringes.**

*Author:* Sille Esbjerg (Novo Nordisk A/S) *Keywords:* GLMM, random effects *Format:* presentation (Statistical modelling) *Contact:* sies@novonordisk.com

Batches of syringes with water for injection are analysed with regard to the amount of oxidis-able substances. It is determined if the amount in the water is above or below a certain thresh-old value. The analysis results are dichotomous. The amount of oxidisable substances is be-lieved to be a function of the age of the syringe, since oxidisable substances may be liberated from the rubber plunger in the syringe to the water. In this particular study some batches of syringes have been analysed several times at different ages and others have only been meas-ured once or twice. A logistic regression model is used to investigate if the hypothesis of age-dependency is true. The covariance structure is modelled either directly or by the use of ran-dom effects in a generalized linear mixed model. The results stemming from the various mod-els are compared.

### 1c. Finance and risk management

**28. Logit Regression versus Tree Regression in Forecasting the Insolvency Risk of the Customers of Automotive Financial Services**

*Authors:* Ennio Davide Isaia (University) and Alessandra Durio *Keywords:* Logit Regression, Tree Classification, Factorial Analysis *Format:* presentation (Statistical modelling) *Contact:* isaia@econ.unito.it

In this paper, in order to forecast the performance of instalments payments of the customers of an automotive financial service, we shall resort to a logit and a tree multivariate regression model. Our data set consists of 298,902 customers of an automotive financial service, whose application has already been successfully scored, and it includes 33 variables arising from merged different data sources. Some of these variables will be used to construct our target variable which enables us to discriminate between good and bad customers; in this way we observe in our data set a rate of bad customers of 6.77%. We first focus our attention to the problem of the right choice of the variables to be plugged into both the models, showing how a bad choice, which can arise resorting to automatic variables selection methods, can highly distort the forecasting power of the models. Furthermore a validation of the chosen set of the explanatory variables will be given by mean of a confirmatory Factorial Analysis . We finally turn to the choice of the best model and to the check of its power; this will be achieved on the basis of the results we obtain testing both the models on a randomly selected control group, which is hold out of sample.

**63. Estimating the Trend in Bank-Branch Deposits in the New York State Using Multilevel Models**

*Author:* Peggy Ng (York University) *Keywords:* Bank-branch deposits, multilevel models *Format:* presentation (Statistical modelling) *Contact:* peggyng@yorku.ca

When clusters are measured at irregular times or when penal data have changing membership, classical linear modeling technique does not work because the traditional method gives equal weight to each cluster in estimating the mean regression line for the level-2 (outer) variables. The method also assumes spherecity in variance structure. Multilevel models are designed to analyze such data by including different dependency structure in the determination of direct effects and cross level interactions. This project examines the change in bank-branch deposit (performance) of a major American bank from years 1994 to 2002 in the New York State and how this change differs among counties (level-2) of various sizes. There are a total of 3452 branch-year records of penal data with changing membership (survivorship of the bank-branch) and time varying covariates through the period. Exploratory data analysis is utilized to identify branch level profiles in terms of trends, change, and variability. Smooth spline will be used to describe the county profile of the bank's deposits in this period of time. Empirical best linear unbiased predictors will be used to estimate the change in total deposits, and that between different county sizes. Effects will be adjusted for County Income and Unemployment Rate of the county. The covariance structure of the mixed model will be estimated by maximum likelihood and restricted maximum likelihood methods, modeling the measurement errors, autocorrelation function, and the random effects. Bank-branch survivourship will also be modeled to supplement the understanding of the activities of the bank in the State.

**86. Six Sigma at a Big Bank. Experiences and Lessons**

*Authors:* Xavier Tort-Martorell (Technical University of Catalonia, UPC); LluÃƒÂs Marco (Technical University of Catalonia (UPC)); Pere Grima (Technical University of Catalonia, UPC); Juan Manuel Ballesteros (BBVA, Banco Bilbao-Vizcaya-Argentaria) *Keywords:* Six Sigma in services, DMAIC methodology, Black Belt training *Format:* presentation (Six Sigma and quality improvement) *Contact:* lluis.marco@upc.edu

At the beginning of May 2004 we started a collaboration with BBVA, a large Spanish bank. The bank has almost 90.000 employees and branches all over the world, specially in South America. The presentation will explain the steps followed in the design of the programm, that was done through a close collaboration between managers from the bank and the UPC professors and the pilot and final design of the program. Then we will cover in detail the sillabus of the Black Belt training that covers methodological aspects and tools, both statistical with MINITAB and process oriented with iGrafx. The courses emphasize the concepts and the use of the tools and are delivered through many examples, games and practical team work. Course duration is 11 days, grouped in four sessions of 4, 3, 3 and 1 days over a period of three to four months. In 8 groups, we trained around 180 BBs, 80 in Spain and 100 in Latin America. All of them carrying an improvement project as part of the training and certification process. After explaning the general framework of the designed program we will present the lessons learned in several aspects: organizational, methodological (in the sense of the way to apply the DMAIC steps), and about the statistical tools more useful in the projects. We will also share some of our experiences during the process.

**88. Predicting Development of Insurance Agencies**

*Authors:* Winfried Theis (MSR Consulting Group) and Michael Schönewald (MSR Consulting Group) *Keywords:* Case Study, Prediction, Business Agencies, Insurance *Format:* (Statistical consulting) *Contact:* Winfried.Theis@msr.de

In this case study our client supplied us with informations on the development of his nearly 300 agencies in Germany over four years. We had to answer the question how the agencies develop on average depending especially on their age and size so that this can be used as a tool to assess the future performance of an agency. Because of the small number of years and the fact that one of them was especially bad for the insurance business in general we decided against a complex repeated measurements model. Instead, we fitted a complete linear model with all possible influences and selected the most important by stepwise regression using all observations where the development in number of contracts from one year to the next was available. This model we used as the model for the first year prediction. For the following years we constructed a much simpler model using the lagged predictions. Although we did not clean the data for contracts being moved from one agency to another -- and similar untypical developments -- the resulting predictions are close enough to the true values that an assessment of the performance of the agency is possible. We finally created a tool from the models to enable the client to quickly perform some predictions.

### 2a. Workshop: Research methods in practice - opening up a toolbox

### 2b. Statistics education and training

**95. Can six sigma really help a SME?**

*Authors:* Tony Fouweather (ISRU (Industrial Statistics Research Unit)) and Ian Fouweather *Keywords:* Six sigma, training, SME *Format:* presentation (Six Sigma and quality improvement) *Contact:* tony.fouweather@ncl.ac.uk

Six sigma training can be of great benefit to most companies as it gives them opportunities to become more efficient and competitive. The cost of this training is often too much for SMEs, leaving them at a severe disadvantage to their larger competitors who can afford the cost and time to train an in-house six sigma specialist. For very small companies with few employees the problem can be magnified. Through funding gained from the European Commission, ISRU were able to address this problem directly by offering hugely discounted six sigma training to local SME's. This paper will seek to demonstrate, through case studies, how statistical techniques can be applied to facilitate improvements in efficiency, reduction in waste/rejects and the general improvement of processes and how this in turn can improve the prospects of a SME. A small local bakery based in Sunderland with 30 employees sent a delegate on an ISRU training course with the aim of learning six sigma techniques to allow them to improve their processes. As part of the black belt course the delegates each bring a problem from their own company as their project. Ã¢â‚¬Å“The Six Sigma training gave us a set of tools which allowed us to improve the efficiency of our packing line for one of our most difficult products.Ã¢â‚¬Â Another case study shows how a local chemical company used modelling techniques to increase their profit. They wanted to predict the time taken for a batch of chemical to dry depending on certain factors which varied batch to batch. The delegate was able to model the drying process with the tools he had learnt on the training course and through the predictive model produced the company was able to produce an extra batch each week. Each batch represented Ã‚Â£6000 profit for the company so this led to a potential gain of Ã‚Â£300000 per year.

**70. Challenges of teaching control charts in the workplace**

*Authors:* Arthur Bakker (University of London) and Phillip Kent; Richard Noss; Celia Hoyles (University of London) *Keywords:* training, control charts, context *Format:* presentation (Statistical consulting) *Contact:* a.bakker@ioe.ac.uk

The main goals of our research are to characterise the statistical and mathematical knowledge and skills people need at work and to design learning opportunities to improve their knowledge and skills (www.ioe.ac.uk/tlrp/technomaths). One of the areas we have investigated is Statistical Process Control in pharmaceutical and packaging sectors. For this discussion paper, we draw on observations of training in a pharmaceutical company supplemented by the results of a workshop we organised ourselves with shift leaders and managers in the food industry, using educational software. In the pharmaceutical training we noted that SPC was taught in a rather abstract way: for example, the SPC charts used were standard examples but different from the ones used at the shop-floor and no opportunities were created for employees to link their shop-floor experiences to the theory of SPC. This might explain the frustration of the trainers that employees did not use what they learnt in the courses. In our own workshop we tried to gain further insight into the ways employees may be supported to coordinate statistical and industrial perspectives. It turned out that the meanings employees attributed to a range of concepts including mean, variation, target, specifications, trend and scale in relation to control charts depended heavily on the context in which they were working. Another conclusion is that to make a real difference in statistical training we should probably create opportunities for employees to link the statistical theory as presented in textbooks with the issues that are important for them from their industrial perspective.

**9. Enabling clients to efficiently analyse their data**

*Author:* Andrew Jack () *Keywords:* software, data management, workflow *Format:* presentation (Statistical consulting) *Contact:* andrew.jack@iname.com

A statistical consultant should aim to empower their client to understand and undertake analyses themselves, without expert help. To achieve this aim, good statistical software tools are essential. The process of analysis is usually bigger than just applying a statistical test: there are often significant issues of data access, management and cleaning, which are of fundamental importance to the analysis process. Unfortunately, this part of the analysis is often done Ã¢â‚¬Å“by handÃ¢â‚¬Â in Excel, which means that the data is subject to possible human error, and the process by which the data is managed is unrecorded and therefore unauditable. This paper shows how Ã¢â‚¬Å“workflowÃ¢â‚¬Â software tools originally designed for Data Mining were adapted for use by scientists carrying out their own statistical analysis. This allowed the whole analysis process to be integrated within one software package, resulting in greater efficiency and accuracy, and automatically generating a clear audit trail for each analysis process. This reduced the time spent by the consultant in troubleshooting software issues for the scientists.

### 2c. Best Manager and Young Statisticians Award and panel discussion

Jaap van den Heuvel, Jeroen de Mast, Jesus Palomo.Panel discussion on the relevance of statistics to business and industry.

### 3a. Workshop: Reliability studies part 1

### 3b. Statistical modelling

**54. A Bayesian approach to critical chain and buffer management**

*Authors:* Fabrizio Ruggeri (CNR IMATI) and Enrico Cagno, Franco Caron and Mauro Mancini (Politecnico di Milano, Italy) *Keywords:* project management; Bayesian models *Format:* presentation (Statistical modelling) *Contact:* fabrizio@mi.imati.cnr.it

Execution of activities in due time is a critical aspect in project management since realisation of industrial plants (or other similar projects) after the contractual time causes heavy losses to the contractor, both financially and in terms of bad reputation. Different methods have been proposed in literature to control activities during projects; in particular, the authors (jointly with Palomo and Rios Insua) considered a Bayesian dynamic linear model to forecast delivery times of items provided by subcontractors. The current paper stems from work by Goldratt on critical chains and provides a Bayesian model, sounder from a mathematical point of view than Goldratt's, to determine times for realisation of activities. In this approach, each activity has to be performed within a given time and a buffer time is introduced at the end of each "critical path" to recover from excesses in the previously scheduled activities. In particular, past data and experts' opinions are used to forecast the (posterior predictive) distribution of time realisations for each individual activity and for all the activities within a critical path. Times for each activity and the final buffer are assigned according to some quantiles of such distributions.

**69. AN INTEGRATED MULTIVARIATE ENGINEERING STATISTICAL PROCESS CONTROL SYSTEM IN A CONTINUOUS POLYMERIZATION PROCESS**

*Authors:* Susana BarcelÃƒÂ³ (Technical University of Valencia) and Javier Sanchis ; Alberto Ferrer *Keywords:* engineering process control, multivariate statistical process control, DMC controllers, PCA models *Format:* presentation (Process modelling and control) *Contact:* sbarcelo@eio.upv.es

Engineering process control (EPC) and statistical process control (SPC) are two complementary strategies for quality improvement that until recently have developed independently. EPC is usually applied to minimize outputs variability by making regular adjustments to some compensatory processing variables. On the order hand, SPC monitoring procedures seek to reduce output variability by detecting and eliminating assignable causes of variation. In this talk we will describe a case study of integrating the EPC and SPC approaches in a continuous polymerization process (high-density polyethylene) to reduce polymer viscosity and to maximize the productivity. This work is an extension to the multiple-input-multiple-output case of a previous research (Capilla et al., 1999). The process manipulated variables are reactor temperature (T) and ethylene flow (E) whose changes represent negligible cost when compared to off-target viscosity or low productivity. The controlled variables are the key quality characteristic, that is polymer viscosity, which is measured by melt index (MI), and a productivity index (APRE), worked out by energy balance. To develop this control system a discrete linear model that characterizes the process dynamics has previously been identified and estimated from data collected in closed-loop operation (BarcelÃƒÂ³ et al., 2003). Model Predictive Control (MPC) has been applied to develop the engineering control part of the system. MPC refers to a class of control algorithms that need an explicit model to predict the future response of a plant to be controlled. Specifically the DMC (Dynamic Matrix Control) control algorithm (Cutler and Ramaker, 1979) has been used. Multivariate projection methods have been applied (Kourti and MacGregor, 1996) to implement the SPC component. From the different variables resulting from the control system (process adjustments, output deviations,Ã¢â‚¬Â¦) when the process is known to operate Ã¢â‚¬Å“in Ã¢â‚¬â€œcontrolÃ¢â‚¬Â, a PCA model with A components has been estimated. Hotelling T2 and SPE (squared prediction error) charts have been constructed for monitoring future process performance. The behaviour of this integrated multivariate ESPC will be illustrated by simulating out-of control signals. BarcelÃƒÂ³, S., Vidal, S., and Ferrer, A. (2003). "A Case Study of Comparison of Multivariate Statistical Methods for Process Modelling". 3th European Network Business and Industrial Statistics Conference. Barcelona.

Capilla, C., Ferrer, A., Romero, R. and Hualda, A. (1999). "Integration of Statistical and Engineering Process Control in a Continuous Polymerization Process". Technometrics 41 (1), 14-28.

Cutler, C. R. and Ramaker. B. L. (1979). "Dynamic Matrix Control- a Computer Control Algorithm". AICHE annual meeting. Houston,TX. Kourti, T. and MacGregor, J. F.. (1996). "Multivariate SPC Methods for Process and Product Monitoring". Journal of Quality Technology 28 (4), 409-28.

**74. Using data based models to improve the design flow in the development of microsystems**

*Authors:* Daniel Herrmann (Bosch GmbH) and Matthias Maute, Robert Bosch GmbH *Keywords:* regression estimation, Gauss process, DACE, design flow, microsystems *Format:* presentation (Statistical modelling) *Contact:* daniel.herrmann@de.bosch.com

Robert Bosch GmbH is one of the world's largest suppliers of microsystems for automotive applications such as pressure, airflow, acceleration and yaw rate sensors. The design of micro-systems is a highly demanding task. Manufacturing tolerances influence the functionality of the sensor and manufacturing variations result in probability distributions of the functional sensor parameters. Since the knowledge of the probability distributions of the functional sensor parameters is crucial for the design of micro-systems, tools and methodologies are needed to calculate and optimise those distributions. For this, detailed modelling of the sensor using state-of-the-art computer-aided engineering techniques is indispensable. Modelling is done on different levels of abstraction including geometry level, network-type level and system level. A key factor for fast and efficient development of complex multi-domain systems is a seamless design methodology for all levels of abstraction. Here we discuss following gap to bridge. The sensor or part of it is modelled in great detail on a geometric level and usually solved using finite element methods implemented in codes like ANSYS, ABAQUS or FEMLAB. The simulation is used to design and optimize the sensor or a part of it. Typically the simulation time prohibits transient or Monte Carlo analyses. There for one uses on the next higher description level behaviour models of the sensor. To obtain the required accuracy of the behaviour model we have also to include the fine structure of the sensor, since it has significant influence on the signal. At the moment the behaviour model is tuned by hand or the behaviour model is derived from the geometric level by some kind of order reduction method. We suggest to tune parts of the behaviour models using non parametric regression estimation Ã¢â‚¬â€œ e.g. Gaussian processes, support vector machines, neural networks, etc. We call this approach in the engineering context a data based model generation (DMG) and has been used in several contexts, see DACE and kriging. Here we discuss the application specific conditions of this method for micro-systems where reliability of the model is crucial in the development of automotive safety systems. We present some successful applications of the method.

**94. Experimental and Numerical Based Sensitivity Calculations for Micro Scale Robust Engineering Design**

*Authors:* Mark Perry (London School of Economics) and Henry Wynn, Ron Bates (LSE) *Keywords:* Robust Design; Bond graph modelling; Dynamic Sensitivity Analysis *Format:* presentation (Statistical modelling) *Contact:* m.perry@lse.ac.uk

Robust Engineering Design (RED) refers to the group of methodologies dedicated to increasing the tolerence of engineering system performance to statistical variations arising from system manafacture and use. In addition to design of experiments, RED now includes computer experiments on CAD/CAE simulation, advanced Response Surface Modelling methods, adaptive optimisation methodologies and reliability/sensitivity analysis. However, although RED has been shown to be very effective in improving product or process design through its use of experimental and analytical methods, it has not been fully developed on the micro scale where statistical variations can become relatively more important. In the work here, through experimental measurements, bond graph modelling, numerical simulations and sensitivity analysis a strategy to provide a platform for the robust design of a micro scale mechatronic device, namely a behind-the-ear (BTE) hearing aid, is presented. Two key components of the hearing aid device, namely the telecoil and the receiver, are considered and the robustness analysis is performed with respect to the variations in the spatial placement of these components.

**98. Bayesian inspection for large industrial systems**

*Authors:* Gavin Hardman (University of Durham) and Michael Goldstein (Univ. Durham) & Philip Jonathan (Shell) *Keywords:* Bayesian methods; DLM; multivariate systems *Format:* presentation (Statistical modelling) *Contact:* g.a.hardman@durham.ac.uk

We aim to develop efficient Bayesian methods for the design of inspection schemes for industrial systems with many components. The problem of optimal inspection of large systems poses many difficulties for Bayesian design. We propose a model for the degradation of industrial systems with multiple components. The objective of the modelling is to provide a framework which will allow us to model and analyse different components jointly. We also attempt to construct the model with a view to simplifying the design of an optimal inspection and maintenance policy. The model consists of a multivariate dynamic linear model (DLM) plus an independent extreme (minimised) error term. The minimisation is used to reflect the nature of the observation process and can be altered accordingly. The DLM is used to model the general trend in a component, and the interactions between these trends. We demonstrate how the model can be updated (jointly, across components) using observational data, and how to predict from the model. This step is performed using a form of Bayesian updating. The model is illustrated by an example based on three real world systems, details of the systems are provided by Shell and the analysis builds on their current inspection practice. A subset of the data is used to estimate initial conditions and the inter-component covariance structure. A second subset is used for model validation. This work has been carried out as part of a PhD EPSRC CASE studentship, under the supervision of Michael Goldstein (University of Durham) and Philip Jonathan (Shell Global Sloutions).

### 3c. DoE general

**6. Modifying a CCD to Model Both the Process Mean and Process Variance Under Randomization Restrictions**

*Authors:* Scott Kowalski (Minitab Inc.) and G. Geoffrey Vining, Douglas C. Montgomery, Connie M. Borror *Keywords:* Split-Plot, Response Surface Methodology, Dual Responses *Format:* presentation (Design of experiments) *Contact:* skowalski@minitab.com

The response surface methodology (RSM) framework for designed experiments has been widely adopted in practice. Idustrial experimenters often encounter situations where some experimental factors are hard to change or where there is significant discrepancy in the size of some experimental units. For both of these situations, the experimental units for some factors are observational units for other factors, which leads to split plot designs. An important question within industrial statistics is how to find operating conditions that achieve some goal for the mean of a characteristic of interest while simultaneously minimizing the characteristic's process variance. This talk establishes how one can modify the common central composite design to allow the estimation of separate models for the characteristic's mean and variance within a split-plot structure. The appropriate analysis of the experimental results will be discussed.

**8. Bootstrap Analysis of Designed Experiments: Part II**

*Authors:* ron kenett (KPA Ltd.) and E. Rahav (Tel AViv University) and D. Steinberg (KPA and Tel Aviv University) *Keywords:* Design of Experiments, Bootstrapping, Missing Data, Split -Plot *Format:* presentation (Design of experiments) *Contact:* ron@kpa.co.il

In the first ENBIS conference in Oslo, Kenett and Steinberg proposed to apply Bootstrapping to the analysis of data derived from moderately sized designed experiments (Kenett and Steinberg, 2001). In this presentation we present follow up work with emphasis on designed experiments with missing data, heteroschedasticity and constraints related to the structure of the experiment, such as split-plotting. Our results show clear advantages to bootstrap based data analysis, which is both robust and easy to apply. The bootstrap analysis often points to problems in linear model analyses that might easily be overlooked. These findings suggest that bootstrapping should be a requirement in the curriculum of engineers and scientists.

**12. Adapting Response Surface Methodology for Computer Experimen5ts**

*Author:* Geoff Vining (Virginia Tech) *Keywords:* Design of Experiments, Space-Filling Designs, Nonparametric Regression *Format:* workshop (Design of experiments) *Contact:* vining@vt.edu

Response surface methodology (RSM) is a powerful experimental strategy for optimizing products and processes. RSM originated in the chemical process industry (CPI), but it has found widespread application beyond the CPI, including finance. Traditional RSM is a sequential learning process that assumes a relatively few number of important factors. It also assumes that low order Taylor series approximations are appropriate over localized experimental regions. Engineers and scientists are using computer and simulation models as the basis for product and process design. These models are extremely complex and typically highly nonlinear. Analytic solutions do not exist. As a result, computer experiments are becoming an important basis for optimizing such systems. Computer experiments present several challenges to RSM. First, the underlying models are deterministic, which means there is no random error. In some cases, random errors are introduced, especially in simulation experiments. Nonetheless, the models themselves are deterministic. Second, these experiments often involve a relatively large number of potential factors. Traditional RSM designs prove much larger than can actually be conducted.

**60. a pilot procedure for eliminating substandard designs due to the unavailability of data**

*Author:* janet godolphin (university of surrey) *Keywords:* confounding; connectivity; factorial design; missing observations *Format:* presentation (Design of experiments) *Contact:* j.godolphin@surrey.ac.uk

The loss of one or more observations during a designed experiment is a hazard well known to statistical practitioners, including practicing engineers involved in product design and development. When missing values occur, the eventual experimental design is necessarily altered from the design that was selected originally, and this can lead to difficulties with unwelcome consequences. Even if the available software allows for missing values in the analysis of the surviving data, there can be the serious repercussion that the eventual design is disconnected with respect to some factors under investigation. Typically the experiment is badly damaged in this situation since many if not all of the factorial effects will be confounded with block factors, making it impossible to formulate adjusted main effect and interaction sums of squares as well as the blocks sums of squares components for the analysis of variance. However, it is shown that the risk of this situation arising can be reduced substantially by judicious design selection at the planning stage of the experiment based upon the approach of Godolphin (2004). A pilot procedure is derived for the avoidance of substandard designs that are acutely vulnerable to disconnectivity through observation loss. These points are illustrated by considering a problem from the semiconductor industry. It is demonstrated that a cyclic design consisting of three replicates of a full 2x2x2 factorial can become disconnected by the loss of only two observations. When this occurs it is not possible to obtain unbiased estimates of any of the seven factorial effects, which is clearly a disastrous consequence for the experiment particularly if the costs associated with the 24 tests are substantial. However, by use of the pilot procedure, the practitioner is able to recognise the vulnerability of this and similar designs before any experimentation is attempted. The pilot procedure is a useful aid in the selection of a design, which is at a substantially smaller risk, in the sense that up to seven observations may be lost without incurring the possibility of a disconnected eventual design.

Reference: Godolphin, J.D. (2004). Simple pilot procedures for the avoidance of disconnected experimental designs. Applied Statistics, 53, 133-147.

**108. Searching active factors in supersaturated designs**

*Author:* Anthony Cossari (University of Calabria) *Keywords:* Supersaturated designs, active factors, Bayesian model selection *Format:* presentation (Design of experiments) *Contact:* a.cossari@unical.it

Supersaturated designs can be used for screening purposes to investigate a number of factors which is at least as large as the number of experimental runs. However, analyzing data from such experiments is a difficult task, because of the nonorthogonality of the supersaturated designs. In recent years several analysis methods have been proposed, which are effective in some aspects. In this paper the results from supersaturated designs are interpreted by using an existing Bayesian all-subsets selection method, which is particularly useful to uncover the active factors in fractionated designs, both orthogonal and nonorthogonal, through the appropriate computation of the posterior probabilities for each one of the possible subsets of active factors. By restricting search of important factors to a main-effect analysis, as is typically the case for supersaturated designs, the Bayesian method turns out to be more effective than other existing analysis methods. It is also used to explore the possibility that two-factor interactions are involved in the explanation of data. A well-known dataset and simulations are used to illustrate the ideas.

### 3d. Statistical consulting

**15. A Factorial Design Planning Process**

*Authors:* Pat Whitcomb (Stat-Ease, Inc.) and Pat Whitcomb and Shari Kraber *Keywords:* DOE, design of experiments, factorial design *Format:* presentation (Design of experiments) *Contact:* pat@statease.com

Newcomers to factorial design find it difficult to choose appropriate designs with adequate power. In this presentation, we introduce a clear process to determine the best design that fits the problem. A discussion of statistical power will show attendees how the size of the effect relative to the noise is a critical criterion in design selection. We will also discuss how to choose the ranges for the input factors, the importance of evaluating aliases, and checking runs for safety. Attendees will take away a clear strategy for determining which design is appropriate for their data analysis needs.

**40. Cost Model Building for Benchmarking Purposes: Statisticians Working Closely With Chemical Engineers**

*Author:* Niels Dekker () *Keywords:* Communication, Consultancy, Model Building, Teamwork *Format:* presentation (Statistical consulting) *Contact:* ndekker@ipaglobal.com

Independent Project Analysis, Incorporated (IPA, Inc.) improves the competitiveness of its customers through enabling more effective use of capital in their business. Over the last 15 years, IPA has developed detailed, carefully normalized databases that contain data about the entire project life cycle from the business idea through early operation. We use this data to develop powerful statistical tools that enable us to compare project performance in numerous areas and identify practices that result in superior project outcomes for cost, schedule, operability, and safety. IPA's research and methodology are applied to assist project teams in improving the execution and competitiveness of their capital projects. This process is known as benchmarking. One of the steps in this process is building cost capacity models. These models benchmark a project's cost using its technical characteristics based on the historical performance of a set of similar completed projects. This presentation will discuss how research analysts (statisticians) interact with project analysts (chemical engineers) to build such models. We will also discuss procedures that IPA has put in place to ensure the correct use of these models by project analysts. A recent example of a gas processing plant model will be used to illustrate this work process. In addition, the model itself will be shortly touched on.

**77. Statistical Significance versus Practical Significance - Why a Hypothesis Test and a Confidence Interval related to the same parameter provide supplementary information**

*Authors:* Oystein Evandt (ImPro) and Shirley Coleman (University of Newcastle upon Tyne) *Keywords:* Statistical Significance, Practical Significance, Hypothesis Test, Confidence Interval *Format:* presentation (Statistical modelling) *Contact:* oystein.evandt@nsn.no

A frequently occurring task in statistical analysis is to draw inferences about the value of one or more parameters in a statistical model, or about the value of a function of one or more parameters. The model may be just a probability distribution. Functions of parameters of different models may also be of interest, e.g. the difference between the means of two independent distributions. It is often possible both to perform a hypothesis test regarding the quantity of interest, and to compute a confidence interval for it. A confidence interval for a quantity, having confidence coefficient 1-alpha, provides a test with significance level alpha for the same quantity, where a one-sided (two-sided) test corresponds to a one-sided (two-sided) confidence interval. A hypothesis test gives information on whether the data and the test method in question give a statistically significant deviation from the null hypothesis in question, at the chosen significance level. However, a hypothesis test alone does not give subject matter area experts the possibility to evaluate whether a statistically significant result is also of practical significance, no matter how small the significance probability is. The values of the endpoints of a confidence interval do however give such a possibility. Therefore a confidence interval should be computed whenever possible. In addition, the significance probability of the test that goes with the confidence interval should be computed if this test is not weaker than the strongest test available. In Ã¢â‚¬Å“standard situationsÃ¢â‚¬Â this is usually the case. However, in practice it is often chosen only to perform a test. In this paper the meaning of statistical significance and practical significance is accounted for, and examples are given of how confidence intervals can be used to evaluate whether a statistically significant result is also practically significant.

**82. Statistical computing in a diverse work context**

*Author:* John Logsdon (Quantex Research Ltd) *Keywords:* Computing, collaboration, consultancy *Format:* presentation (Statistical consulting) *Contact:* j.logsdon@quantex-research.com

Two of the major impediments both to statistical work and collaboration with colleagues in the same extensive institution or elsewhere are what software to use and where to use it. Clients are understandably reluctant to allow outside access to their in-house systems and consultants therefore have to fund their own computing. Where a particular package is required, this can also become very difficult and mitigate against using the optimum statistical solutions. And moving data between systems can be a time-consuming procedure and does not allow the fluent manipulation by client staff and consultant. Another problem exists for software houses - how to market their products - do they sell a lot cheaply or a few at a high price? They need an alternative approach where they can control who uses the software and be paid on a pay-by-use basis, including periodic support payments. This can apply to small innovative packages as much as to the major programs. The industrial-statistics.com web site and associated server are now operational and solve these problems. We will review the facilities on the web site, including the administration system that enables the user to set up accounts on the statistics server, manage groups and interact with other users on the system. We will demonstrate the mechanism for administering software on the server - both from an account users and software owners viewpoint which can provide either an additional or only income stream. We will show how users can request and be granted access to a common group directory on a temporary basis. Hopefully we will be able to do all this on the live system. We will also discuss the security issues and plans for the future.

**83. Implementation of DOE in a technology- and quality-driven company**

*Author:* Gernot Schubert (Hilti AG) *Keywords:* implementation of methods, training, knowledge management *Format:* presentation (Statistical consulting) *Contact:* gernot.schubert@hilti.com

To enhance its product leadership in technologically leading products and systems for construction professionals world wide, Hilti started to spread the use of DOE among its research, development, supply chain and quality departments. This presentation will show the different stages of the implementation and the support of the participants for the program. Initially, the need for the method was identified by a small number of enthusiasts among the company's engineers. After acquiring knowledge of DOE, receiving external support and creating quick internal successes by applying DOE, the Ã¢â‚¬Å“method ownerÃ¢â‚¬Â began to gain the commitment and support of the management. This offered sufficient freedom to begin to apply DOE in the areas of mid- and long-term process- and technology improvement. In the next stage it was possible to create real fans of the method among engineers, which led to a demand for further training. After courses for potential Ã¢â‚¬Å“super usersÃ¢â‚¬Â of DOE, a Ã¢â‚¬Å“community of practiceÃ¢â‚¬Â was implemented to provide support and a platform to share experience and knowledge. In parallel, DOE was introduced to indirectly involved persons (non-experts, i.e. project managers, test engineers) in development, engineering and quality to show the benefits and the correct point of application of the method.

### 4a. Workshop: Reliability studies part 2

### 4b. Process modelling and multivariate methods

**7. A Multiscale Approach for the Monitoring of Paper Surface Profiles**

*Authors:* Marco Reis (Department of Chemical Engineering, University of Coimbra) and Pedro M. Saraiva (Department of Chemical Engineering, University of Coimbra, Portugal) *Keywords:* Multiscale analysis; Wavelets; Statistical process control; Process monitoring and diagnosis; Paper surface *Format:* presentation (Process modelling and control) *Contact:* marco@eq.uc.pt

Paper is a complex material, exhibiting properties that derive from a structural hierarchy of arrangements for different elements (molecules, fibrils, fibres, network of fibres, etc.), beginning at a scale of a few nanometres and proceeding all the way up to a few dozens centimetres or even meters. This complexity is also present at its boundary, the paper surface, which plays a central role in many of the relevant properties from the perspective of the end user, such as general appearance (optical properties, flatness, etc.), printability (e.g. the absorption of ink) and friction features, to name a few. Accurate paper surface profiles contain the fundamental raw information of the surface for a wide range of length-scales, to which different aspects of the paper quality are connected. With the goal of exploring the availability of such paper surface data obtained through a mechanical stylus profilometer, we present an approach for setting up a Multiscale SPC procedure that simultaneously monitors two key quality surface phenomena expressed at different scales: roughness and waviness. Raw profiles, after adequate processing using a multiscale framework based on wavelets, give rise to quantities that can be effectively used to monitor these two phenomena in a simple and integrated way, and therefore be implemented in practice for quality control purposes. The effectiveness of the proposed procedure is assessed by simulation as well as through a pilot study involving real paper surface profiles.

**37. Multivariate Control Charts based on Bayesian State Space Models**

*Author:* Kostas Triantafyllopoulos (University of Sheffield) *Keywords:* Control charts, state space models, Kalman filtering, Bayesian forecasting, time series, process control *Format:* presentation (Process modelling and control) *Contact:* kostas@sheffield.ac.uk

Control charts have always been in the centre of interest in statistical process control. The available literature offers numerous references on control charts, although multivariate control charts have not been discussed so extensively. Recently there has been a notable interest in multivariate control charts (Liu, 1995). Multivariate correlated data can be monitored with exponentially weighted moving average control charts as described in Rigdon (1995) and Yeh et al. (2003). Pan and Jarrett (2004) study control charts based on state space models. They concentrate mainly on vector autoregressive time series and they adopt the Kalman filter approach to estimation and prediction. In this paper we follow their general idea of developing control charts, but we showcase the practical applicability of state space models considering local level, trend and seasonal processes for simulated and real data. We adopt the Bayesian framework for estimation and forecasting and so we model the cross-correlation of the multivariate data with appropriate distributions. We consider situations where the variance-covariance matrix of the measurement noise: (a) is constant over time and (b) it is changing over time and we estimate this covariance matrix using the data. As well as using general measures of goodness of fit (e.g. mean squared forecast error and mean absolute percentage error) we use Bayesian monitoring and adaptation based on Bayes' factors. In our study we closely examine the influence of the prior distributions of the parameters, something usually overlooked or not appropriately discussed in the state space literature. We therefore propose a complete scheme for multivariate control charts applied to a wide range of multivariate correlated time series. The method offers increased flexibility and adaptability compared with other multivariate procedures for multivariate control charts.

References

Liu, R.Y. (1995) Control charts for multivariate processes. J. Amer. Stat. Soc. 90, 1380-1387.

Pan, X. and Jarrett, J. (2004) Applying state space to SPC: monitoring multivariate time series. J. App. Stat. 31, 397-418.

Rigdon, S.E. (1995) A double-integral equation for the average run length of a multivariate exponentially weighted moving average control chart. Stat. Prob. Let. 24, 365-373.

Yei, A.B., Lin, D.K.J., Zhou, H. and Venkataramani, C. (2003) A multivariate exponentially weighted moving average control chart for monitoring process variability. J. App. Stat. 30, 507-536.

**50. Statistical Process Control: economic and multivariate**

*Authors:* András Zempléni (Eötvös Loránd University, Budapest) and Csilla Hajas (Eötvös Loránd University, Budapest), Katalin SzabÃƒÂ³ (Eötvös Loránd University, Budapest), Belmiro Duarte (Instituto Superior de Engenharia de Coimbra, Portugal) and Pedro Saraiva (Department of Chemical Engineering, University of Coimbra, Po *Keywords:* economic charts, process control, multivariable control *Format:* presentation (Statistical modelling) *Contact:* zempleni@ludens.elte.hu

In previous ENBIS conferences, [1], [2] the authors have presented work regarding the application of Markov Chains for cost-optimal definition of SPC charts, designated as economic charts. Extending on such previous work, we consider the related question of the so-called economic-statistical charts, where the economic optimization is carried out only conditionally for some statistical assumptions, such as the first type error probabilities [3]. We show our results towards the unification of these related notions. We extend our results to the case to multivariable control, with relevant control charts optimised with respect to different, realistic cost functions associated with sampling, false alarms and non-detected shifts. The practical results obtained through the application of our approaches to industrial data collected from a Portuguese paper mill are also presented, showing the potential benefits derived from their use in real environments for achieving adequate statistical process control and monitoring.

References

[1] Zempleni, A., Hajas, Cs., Duarte, B. and Saraiva, P. "Optimal Cost Control Charts for Shift Detection", presented at the third European Network for Business and Industrial Statistics (ENBIS) conference, Barcelona, Spain (2003).

[2] Zempleni, A., Hajas, Cs., Duarte, B. and Saraiva, P. "Probability Estimation for Mixture Distributions and their Application to Statistical Process Control", presented at the fourth European Network for Business and Industrial Statistics (ENBIS) conference, Copenhagen, Denmark (2004).

[3] Saniga, E.M., Economical statistical control chart designs with an application to X and R charts. Technometrics, 31, 313-320 (1989).

**68. A case study in establishing a process with competing outcomes**

*Author:* Maria Lanzerath (W. L. Gore & Associates) *Keywords:* analysis of designed experiment, competing outcomes, score function *Format:* presentation (Design of experiments) *Contact:* mlanzera@wlgore.com

Target groups: engineers/ non-statisticians, statisticians new to DoE

In the analysis of a designed experiment we often measure more than one outcome variable. If the goal of the DoE is to determine a good parameter setting e.g. for a machine we need one process window that delivers good quality for all the product features we have evaluated. Sometimes these features are contradictory and they would require different process settings - which is of course impossible to do. This presentation shows a way how to cope with competing outcome variables in the statistical analysis by building a common score function and analysing this one score function. This score might combine very different types of outcome data at a time, e.g. subjective ratings and objective measures. It is shown how to construct the score and how to do the analysis. The technique is applied to a case study in the textile industry, from making seams for garments.

### 4c. DOE case studies

**16. COMPARING DIFFERENT FRACTIONS OF A FACTORIAL DESIGN Ã¢â‚¬â€œ A METAL CUTTING CASE STUDY**

*Authors:* Erik Mønness (Hedmark University College) and M.J.Linsley & I.E. Garzon (Industrial Statistics Research Unit, University of Newcastle upon Tyne, United Kingdom) *Keywords:* Fractional-factorial designs, folded designs, Full factorial design, data transformations, significance *Format:* presentation (Design of experiments) *Contact:* erik.monness@hihm.no

Full factorial designs of a significant size are very rarely performed due to the number of trials involved and unavailable time and resources. The data in this paper was obtained from a six-factor full factorial (2^6) Designed Experiment that was conducted to determine the optimum operating conditions for a steel cutting milling process. The six factors Tool speed, Work piece speed, Depth of cut, Coolant, Direction of cut and Number of cuts were varied each with two levels. Each treatment was repeated 8 times so the mean and standard deviation of each treatment were available. A reciprocal transformation was applied initially to stabilize the variance between treatments. The aim of this paper is to analyse 2^(6-3) (one-eighth) and 2^(6-2) (one-forth, and also taken as folding-over sets from the one-eight sets) fractional factorial designs and empirically compare with the full 2^6 design with 64 runs. There are 8 disjoint subsets of runs constituting a 2^(6-3) design and 4 disjoint subsets of runs constituting a 2^(6-2) design. Four of the 2^(6-3) experiments, in a separate investigation, have been partly de-aliased by adding 4 more runs. Also Plackett-Burman designs with 12 runs is analysed (there are two of them). Estimates from the fractional designs are distorted by aliasing and a higher experimental error. The talk gives empirical evidence concerning the improvement in results by adding more runs and by choosing different designs. Also the stability of the experimental factor estimates, across the different Ã¢â‚¬Å“equalÃ¢â‚¬Â fractions, is investigated. The talk provides evidence for managers and engineers that the choice of an experimental design is very important and highlights how designs of a minimal size may not always reveal relevant information

**22. The False Discovery Rate for Multiple Testing in Large Factorial Experiments**

*Authors:* David Steinberg (Tel Aviv University) and Steinberg, D.M., Triploski, M. and Benjamini, Y. *Keywords:* Factor screening; Lenth's method; Multiple comparisons. *Format:* presentation (Design of experiments) *Contact:* dms@post.tau.ac.il

Identifying the important factors and effects is an important goal in the analysis of two-level fractional factorial experiments. In moderate and large experiments, the large number of potential effects presents problems of multiple statistical inference. In this work we apply the False Discovery Rate (FDR) idea in multiple inference to effect identification in factorial experiments. The FDR is the expected proportion of inert effects among those declared active, which we believe is a more relevant quantity to control than the probability of a single erroneous declaration, the criterion that has been adopted in previous studies. We show how to combine the control of FDR with popular methods for estimating contrast standard error in unreplicated experiments. We illustrate the ideas using an industrial experiment that studied eight factors in 128 runs, with replication and division into 8 blocks. In this experiment, the FDR identifies many more active effects than other methods. We will also present some simulation results showing the improvements in power that can be obtained by controlling the FDR.

**27. The use of a 12 run Plackett-Burman design in the injection moulding production of a technical plastic component.**

*Authors:* John Tyssedal (The Norwegian University of Science and Technology) and Hallgeir Grinde (Elkem ASA) *Keywords:* Screening, Plackett-Burman, Multiple responses. *Format:* presentation (Design of experiments) *Contact:* tyssedal@stat.ntnu.no

Microplast AS is located at Stjørdal a little town just northeast of Trondheim. They are one of the leading companies in the injection moulding of technical plastic components in Scandinavia. An injection moulding machine may have 15-20 variables that need to be set to operational conditions when the production of a new product is started. This case study reports the planning, experimentation and analysis of a screening experiment performed at Microplast AS in order to find settings of machine variables such that required specification limits for a certain product could be met and also ensure the injection moulding process to run fast. Eight factors were included in the experiment and nine responses were measured. The design used was a 12 run Plackett-Burman design.

**85. Application of Stochastic Models to Engine Calibration**

*Author:* Justin Seabrook (Ricardo UK Ltd) *Keywords:* engine calibration DoE stochastic methods *Format:* presentation (Design of experiments) *Contact:* justin.seabrook@ricardo.com

The paper describes the development and calibration of high specification gasoline engines. Engine characterisation using Design of Experiments (DoE) is valuable for research and development projects; easy to apply DoE modelling techniques are essential for series calibration programmes. In the paper, the DoE process is described, taking as the main example the calibration of an engine featuring both variable valve timing (VVT) and direct injection, and highlights the planning of the experiments, the requirements for the DoE tools and the process for optimisation using DoE models.

**110. Construction of Marginally and Conditionally Restricted Designs in Uranium Production**

*Authors:* JesÃƒÂºs LÃƒÂ³pez Fidalgo (Universidad de Salamanca) and Ignacio Ballesteros (ENUSA Industrias Avanzadas S.A.), JesÃƒÂºs LÃƒÂ³pez-Fidalgo (University of Salamanca), RaÃƒÂºl MartÃƒÂn-MartÃƒÂn (Pontifical University of Salamanca), Ben Torsney (University of Glasgow) *Keywords:* optimal experimental designs, uncontrollable factors, uranium pellets *Format:* presentation (Design of experiments) *Contact:* fidalgo@usal.es

In standard classical and experimental design all explanatory variables in a model are usually assumed to be under the control of the experimenter. However there are many experimental situations in which some of the independent variables cannot be controlled by a practitioner. We consider the production of uranium rods for use as fuel in nuclear plants. One important part of this procedure consists of sintering the uranium pellets. They must have a specific density as well as a specific porosity. The pores are to retain fission gases during the heating process in the nuclear plant. For this purpose the initial blend is supplemented with some additives. After a period of time in the furnace the additives are burned off resulting in pores being formed in the pellet. The following variables are considered: Ã¢â‚¬Å“Initial densityÃ¢â‚¬Â, which is not subject to control but its values are known before the test is carried out; and Ã¢â‚¬Å“Percentage of additive U3O8Ã¢â‚¬Â, which is completely under the control of the experimenter. There is interest in describing the final density as a function of the initial density and of the percentage of additive. The model which has been adopted in practice is quadratic regression. General design theory needs to be adapted to this new situation. Several criteria are taken into consideration to find optimal conditional designs given some prior information on the distribution of the uncontrollable factors. In order to determine these optimal conditional designs an efficient class of multiplicative algorithms is considered.

### 4d. Statistical consulting

**45. Statistics for Quality and quality magazines**

*Author:* Fausto Galetto (Politecnico di Torino) *Keywords:* Statistics, Probability, Quality, Reliability Predictions, Reliability Tests, Control Charts, DOE, Robust Design, Scientific Approach *Format:* presentation (Statistical modelling) *Contact:* fausto.galetto@polito.it

The literature on "Quality" matters is rapidly expanding. Higher Education is seen many times as a Production System, and students are considered as "Customers". Books and magazines are suggested to students, attending "Statistics or Quality Courses" at Universities, and Master courses as well. Some of them are good, some are not so good. Students use papers form "quality magazines" for their theses; some papers have good Quality, some are not very good. Therefore it seems important to stand-back a bit and meditate, starting from a managerial point of view. In the paper, using published documents (found in magazines used by mangers and professionals, and suggested to students):

- First, we will analyse some of the papers form the point of view of the QUALITY PRINCIPLES, stated in the ISO 9000:2000 standard.

- Then we will see "How could a Manager decide about the methods proposed in the papers".

- Later, we will see "How could a Manager find a scientific and good method".

- Last we will show how the "Scientific Approach" is able to prevent all the non-conformities that can be found.

Quality Engineering looks at designing Quality into every product and all the processes. Design is surely the most important phase for Quality both of products and processes (as are courses at Universities or in Companies).

Quality is therefore really needed in higher education.

Quality must be designed for "Quality Courses" in higher education.

Students need "Quality papers" on Quality.

Consultants need statistical knowledge if they want to help companies in making Quality.

We will show that

facts and figures are useless, if not dangerous, without a sound theory. (F. Galetto), or better, using Deming's own words

"Management need to grow-up their knowledge because experience alone, without theory, teaches nothing what to do to make Quality". (Deming)

**66. Estimation of Taguchi Loss Function by Error Propagation Model**

*Authors:* Adam JednorÃƒÂ³g (Center for Advanced Manufacturing Technologies) and Koch Tomasz (Center for Advanced Manufacturing Technologies) *Keywords:* loss function, varaition transmission *Format:* presentation (Six Sigma and quality improvement) *Contact:* adam.jednorog@pwr.wroc.pl

The impact of variation in given process characteristic on loss incurred by customer is usually expressed by a Loss Function. Because the precise estimation of such a function is in many supplier-customer relations impossible or at least very difficult the Quadratic Loss Function proposed by Taguchi is usually used. In this paper the new model for the estimation of losses in a given stage of consecutive operations caused by one of the previous operations is proposed. It takes into account the amount of variation transmitted between considered operations. The autoregressive model was used as a model for variation transmission. Finally the comparison of loss estimation done by quadratic loss function and proposed model was presented.

**103. Data representation by orthogonal expansions: Principal Components Analysis and Karhunen-Loeve expansion. How do PCA and KL differ?**

*Authors:* Iryna Snihir (Eurandom) and William Rey (Eurandom) *Keywords:* Principal Components Analysis, Karhunen-Loeve Expansion, Singular Value Decomposition, Orthogonal Basis Functions *Format:* presentation (Statistical consulting) *Contact:* snihir@eurandom.tue.nl

The principal components analysis (PCA) and the Karhunen-Loeve (KL) expansion are two similar linear methods that have proven to be extremely useful in the study of physical, biological, electrochemical and other complex phenomena. By providing insight into the intrinsic structure of data, these techniques offer means to compress the data, to filter out and to estimate noise. Although similar in their goals and essence, one may be more appropriate than the other for a particular application. The goal of this presentation is to emphasize the often ignored difference between the principal component analysis and the Karhunen-Loeve expansion and to reveal advantages of appropriately choosing between the two. KL is an approximation technique: The data are projected onto a lower dimensional linear subspace which minimizes the sum of the squared deviations of the data vectors from their projections onto that subspace. This results in the representation of the data in terms of an orthogonal basis for that subspace. This representation is the most efficient in the mean-square sense, and in particular, no other representation with fewer basis functions can achieve the resulting accuracy. On the other hand, PCA is used to explore the variance-covariance structure of the data. Although the viewpoints are quite different, the implementations are very closely related: Singular value decomposition (SVD) is applied in both cases to determine unique orthogonal basis functions (eigenvectors) and the corresponding singular values. This difference of viewpoints delineates the fields of applicability of each of these methods. We will review the concepts and basic derivations of the both methods, give appropriate geometric interpretations and compare the two methods.

**106. An analysis of the use of the Beta distribution for planning large complex projects**

*Author:* Christian Hicks (Newcastle University Business School) *Keywords:* Project planning; stochastic models; beta distribution *Format:* presentation (Business and economics) *Contact:* Chris.Hicks@ncl.ac.uk

Large complex engineering projects are usually planned using project management systems that are based upon the Project Evaluation and Review Technique (PERT). This approach uses the Beta distribution to represent uncertainties in the duration of activities. The probability density function for a Beta distribution can be uniform, symmetric or skewed depending upon the parameters used. Planners produce three estimates of activity durations: most likely, optimistic and pessimistic. These estimates are used to configure the parameters of the Beta distribution. In practice, uncertainties are cumulative; for example an assembly process cannot start until the latest component is available. This paper explores the relationship between the planning values used, the Beta distribution parameters and shape. A simulation case study is then presented that is based upon data obtained from a capital company that produced large complex products in low volume. The work evaluates the impact that different types of uncertainty have on manufacturing performance for products with many levels of assembly.

**109. Development of Balanced Scorecard into Six Sigma organizations**

*Authors:* Michal Baranowicz (Wroclaw University of Technology), Kamil Torczewski, Monika Olejnik *Keywords:* Six Sigma, Balanced Scorecard, Project selection process *Format:* presentation (Six Sigma and quality improvement) *Contact:* michal.baranowicz@pwr.wroc.pl

Six Sigma nowadays is still undeniably one of the most powerful business strategies that gives opportunity to increase company’s profit. One of the key success factors for developing Six Sigma strategy within the company are right project portfolio and proper project selection strategy. It is very important especially in case of big companies when the size of Six Sigma infrastructure and the huge number of realized projects can results in loosing the control of improvement activities. In the worst case, improvements go toward the different, sometimes opposite directions. The ability of managing project portfolio and connecting its content with business objectives becomes crucial. One the most effective tools that can help us to do that, is the Balanced Scorecard. It’s a tool that is relatively well known and quite often used in companies however, not all the potential of Balanced Scorecard is utilized. In this article the possibilities and methods of using Balanced Scorecard as a tool that supports project selection process were presented.

### 5a. Workshop: Reliability studies part 3

### 5b. Process models and engineering process control

**38. Brush wear detection by Continuous Wavelet Transform**

*Author:* TalÃƒÂa Figarella GÃƒÂ³mez (EURANDOM) *Keywords:* Brush wear, change points, wavelet transform modulus maxima *Format:* presentation (Reliability and safety) *Contact:* Figarella@eurandom.tue.nl

The brushes of a DC motor are current conducting material that ride on the commutator of the motor forming an electrical connection between the armature and the power source. They are one of sensible parts of the motor to wear out damaging the commutator segments. Visual inspection of the intensity and number of sparks at brush-commutator interface is an accepted measure for brush condition. This method, however, is not suitable for on-line monitoring. We propose a wavelet-based approach to extract indicators from the current related to the wear of the brushes that can be used for on-line monitoring. A commutation wave consists of two parts: the rising and falling parts. As the mechanical contact between brushes and commutation segments increases the rising part becomes wider, and this is an indication of wear in the brushes. We implement an algorithm based on wavelet maxima lines to localize the singular points where the commutation wave changes from the rising to falling parts. These lines are then used to calculate the width of the rising part as an indicator of the condition of the brushes.

**115. Statistical surveillance and geotechnical monitoring systems**

*Authors:* Alessandro Fasso (University of Bergamo) and Domenico Bruzzi , Giorgio Pezzetti, Orietta Nicolis, Michela Cameletti *Keywords:* *Format:* presentation (Process modelling and control) *Contact:* alessandro.fasso@unibg.it

Geotechnical monitoring systems are based of a network of possibly heterogeneous measuring systems displaced along the monitored object which may be a building, a structure, an excavation area or a landslide. FassÃƒÂ² et al. (2004) and (2005) considering single field instrument systems show that meteorological and environmental causes may affect the measures and robust statistical adjustment and smoothing may give great improvement on measuring uncertainty. In this talk we first consider the statistical approach to modelling a monitoring system, than we define the generalized surveillance setup based on stochastic monitoring techniques which extend control charts. Case studies from field monitoring will illustrate the methods.

References

FassÃƒÂ² A., Nicolis O., Bruzzi D., Pezzetti G. (2004) Statistical modelling and uncertainty reduction of monitoring data in geomechanics, Working paper n.3/MS, Dept. IGI, University of Bergamo (http://www.unibg.it/struttura/struttura.asp?cerca=digi_dp)

FassÃƒÂ² A., Nicolis O., Bruzzi D., Pezzetti G. (2005) Modelling and reducing uncertainty of field monitoring data in geomechanics by computerized statistical methods, IACMAG2005, Turin.

**47. Statistical driven development of OBD systems. An overview**

*Author:* Stefano Barone (University of Palermo) *Keywords:* *Format:* presentation (Process modelling and control) *Contact:* stbarone@dtpm.unipa.it

During the last five years a lot of work has been done in a series of projects made in cooperation between Italian University and automotive industry. The projects concerned the analysis of performances and the optimisation of on-board diagnostic (OBD) systems, by using statistical methodologies. Automotive OBD systems are systems designed to keep critical components under control during the functioning of the vehicle and alert the driver in case of severe malfunctions. OBD systems aimed at controlling emissions-related components are compulsory on all new vehicles since their evident impact on environment pollution. The results of the projects have been quite successful in terms of knowledge advancement and industrial gain. Some specific results were summarised in preceding papers presented at conferences and submitted for publication on academic journals. The topic of design and analysis of OBD systems is still of big interest in many areas of theoretical and applied research. For example a recent call of the European Commission had as objective to support research and development of systems that process and integrate biomedical information from different levels (e.g. molecule, cell, tissue, organ) and from many different places (e.g. clinical resources, biomolecular resources) with the purpose to improve health discovery and understanding and the health status of an individual, i.e. to improve disease prevention, diagnosis and treatment. In this paper, an updated overview of the methodologies adopted and the results obtained, directly and indirectly related to the whole research project are given. These results could be of a certain value for both theorists and practitioners since they witness the powerful use of statistics as a catalyst of technical progress and give a possible line of action for further application in different fields of research.

### 5c. DOE general

**78. Simulation models for optimising the design of clinical trials**

*Authors:* Ismail Abbas (Universitat Politècnica de Catalunya) and Joan Rovira (Health Economics and Social Policy Research Center); Erik Cobo (Universitat Politècnica de Catalunya); Josep Casanovas (Universitat Politècnica de Catalunya) *Keywords:* Clinical trials, simulation, optimisation, cost, duration, power *Format:* presentation () *Contact:* ismail.abbas@fib.upc.es

Background: Clinical trials are requested for marketing authorization of new products and new indications. However, trials are often costly and have a long duration. Therefore, they should be design so as to maximize the likelihood of attaining their objective to provide the evidence required for marketing authorization with the lowest possible cost. Moreover, an error in the design of a clinical trial can imply the impossibility of confirming the hypothesis because of the fact results are lacking in statistical significance. This will mean to incur in a loss in resources and a delay in the introduction of the new pharmaceutical product into the market. Objectives: The main objective of this work is the construction of a simulation model that integrates clinical research with statistic, economic and informatic aspects in order to optimise the design of clinical trials. Methods: We developed a simulation model for optimising the design of clinical trials which allows reducing their total expected cost and therefore maximising the expected net benefit. This model consists of different sub-models, the first one is a recruitment of patients and treatment assignment sub-model, the second is the follow-up sub-model, which describes the progression of the disease in question, and the third one is the cost sub-model, which integrates the cost function into the general model. We used statistical analysis to estimate the most efficient statistical design and to find the optimal design that reduces cost and duration of the trials. Results: The results of this work have been applied to a clinical trial of lipodistrophy in HIV/AIDS patients on antiretroviral treatment. The model has been build up and validated and a statistical model has been selected, which results adjust as good as possible to the results of the trial. The same model is later applied to the optimisation of the design of a future clinical trial. The model estimates that number of trial centers that minimize the total cost is found to be between 3 and 6 centers, and the visits number of follow-up ranges between 4 and 8 visits. In this case, the optimal cost is encountered between 89 and 94 thousand monetary unities. Similar results were found when we have had the consideration of missing data. When we carry out the clinical trial in one center and for two follow-up visits, the expect cost would be the 378 monetary unities, which is the highest case that found in this work. Discussion and conclusions: The approaches currently applied to the design of clinical trials, including those that use simulation models, do not take into account the relevant aspects for the overall optimisation of the design. What this work precisely suggests as its main contribution is the need to integrate all these aspects in a general simulation model for the design optimization of a clinical trial according to explicit criteria and assumptions. This research approach allowed the development of a general model of optimization where the optimal design is seen in terms of maximisation of the expected net benefit.

**18. Robust Designs for Misspecified Logistic Models**

*Authors:* Adeniyi Adewale (Department of Mathematical and Statistical Sciences) and Doug Wiens (University of Alberta, Edmonton, Canada) *Keywords:* Robust Designs, Logistic model, Misspecification, Simulated Annealing *Format:* presentation (Design of experiments) *Contact:* aadewale@ualberta.ca

We have developed criteria that generate robust designs and have used such criteria for the construction of designs that insure against possible misspecifications in logistic regression models. The classical approach to constructing designs for the logistic model has been predicated on the assumption that the specified model is exactly correct. In practice, this is often not the case; model misspecification is rather a norm than an exception. We illustrate the effects of misspecifications in the assumed model form and give expressions for quantifying the bias error engendered by model misspecification. The design criteria we propose are different from the classical in that we do not focus on sampling error alone by using just the information matrix as the design criterion. Instead we use design criteria that account as well for error due to bias engendered by the model misspecification. In addition to the complexity added by the dependency of the design criteria on unknown model parameters, as is the case in designs for non-linear models, the covariance of the estimated model parameter depends on the variance of the response prescribed by the true model. The residual-adjusted estimate of this variance is constructed by estimating the discrepancy between the true mean response and the assumed mean response. When there are initial data we estimate the discrepancy using the residuals. In the absence of initial data we generate a Monte Carlo sample of the contamination from the prescribed misspecification neighbourhood. We use the average contamination of the Monte Carlo sample as an estimate of the expected contamination and based on this the variance of the response is adjusted appropriately. We then propose a sandwich estimator, which is a function of the residual-adjusted variance estimator of the response, for the covariance of the estimated model parameter. Our robust design optimizes the average of a function of the sampling error and bias error over a specified misspecification neighbourhood. Examples of robust designs for logistic models are presented, including a dose-response design for rheumatoid arthritis patients.

**42. An efficient method to learn about the process behavior of sheet metal spinning - knowledge transfer by means of meta-models**

*Authors:* Nadine Henkenjohann (University of Dortmund) and Roland Göbel *Keywords:* stochastic process, meta-models, sheet metal spinning *Format:* presentation (Design of experiments) *Contact:* henkenjo@statistik.uni-dortmund.de

Sheet metal spinning is a complex forming process used in low volume production of rotationally symmetric workpieces. This process is characterized by highly nonlinear relationships and a liability of workpiece failure for most of the operable design space. The boundary of the failure region is unknown, because it depends on the geometry and material of the workpiece. At the ENBIS conference 2004, an adaptive sequential optimization procedure (ASOP) was presented which guarantees a successful and efficient optimization of this process for a fixed geometry. In this paper, meta-models combining the single models derived by the ASOP for different geometries will be developed to efficiently improve process understanding of sheet metal spinning. This approach obtains relevant information about the process behavior for a new workpiece geometry without additional experiments. In order to appropriately depict the changes in the process due to varying workpiece geometries, different kinds of information have to be taken into account. The most important information concerns the shape of the stable region. Pre-experiments have shown that the shape of the stable region can be approximated by n-dimensional ellipsoids. A meta-model will be defined to predict the form of the ellipsoids depending on varying workpiece geometries. Additionally, it is of interest to identify an area within the stable region leading to workpieces with a good overall quality. To achieve this goal a second type of meta-model is used, which is based on stochastic process models. The two-step metal-model approach is exemplified by optimizing cups of varying heights.

### 5d. Customer surveys and Kansei engineering

**11. Kansei Engineering for SME's**

*Authors:* Carolyn van Lottum (ISRU, The University of Newcastle upon Tyne) and Carolyn van Lottum, Kim Pearce, Shirley Coleman *Keywords:* Kansei Engineering, Semantics, SME's *Format:* presentation (Statistical consulting) *Contact:* c.e.vanlottum@btopenworld.com

Kansei engineering attempts to put emotion into design by statistically mapping the physical design features of a product to the emotional responses of the consumer. Many large companies have adopted the Kansei methodology and report great benefits from the approach, but it is felt that SME's could also gain a competitive edge using Kansei. This paper outlines results from the first major field trial carried out as part of KENSYS, a European research project into the use of Kansei Engineering by SME's. In this study the Kansei methodology has been applied to the design of everyday men's footwear, but in this paper we will also generalise the findings into guidelines for SMEs wishing to apply the ideas.

**14. Influence factors on satisfaction of car divers. An application of the LISREL method.**

*Authors:* Laura Ilzarbe (Tecnun (University of Navarra)) and M. JesÃƒÂºs Alvarez; Elisabeth Viles (Tecnun, University of Navarra) *Keywords:* customer satisfaction; LISREL *Format:* presentation (Statistical consulting) *Contact:* lilzarbe@tecnun.es

It is a well-known fact that companies thrive on the loyalty of their customers. But customer loyalty is heavily dependant on customer satisfaction. This piece of research focuses on the development of a methodology for determining the factors that contribute to produce the overall satisfaction of customers. Moreover, the influence of those factors on overall customer satisfaction is quantified. The methodology is based on the LISREL method and is applied to a particular real-life case i.e. that of a number of customers who bought a new car three years ago to a specific car manufacturer. The following steps were taken: step 1, a questionnaire was designed; step 2, a telephone survey was carried out; step 3, an initial approach to finding the factors that generate customer satisfaction was made through an exploratory factorial analysis; step 4, those factors were confirmed and their influence on overall customer satisfaction was quantified through LISREL.

**30. The application of the generalized polychoric correlation coefficient in the analysis of an employee satisfaction survey.**

*Authors:* Annarita Roscino (Department of social Sciences University of Bari) and Alessio Pollice *Keywords:* employee satisfaction, generalised polychoric correlation *Format:* presentation (Statistical modelling) *Contact:* annaritaroscino@yahoo.co.uk

It is in the interest of an organization to retain employees and to minimize turnover. Many researches have shown that employees' satisfaction level affects the intent to leave the company and that happy workers are always more productive. The purpose of this study is to understand the factors that mostly influence the satisfaction of employees with their jobs. An online survey was carried out by Survey Solutions Ltd among the employees of an organization within the United Kingdom. Respondents were asked to evaluate, on a 1 to 5 scale, their overall job satisfaction and their satisfaction with management, work conditions, rewards and benefits, team working and some other aspects of their positions. A new correlation coefficient, the generalized polychoric correlation coefficient, was used in order to evaluate the relationship between the items of the questionnaire. As for the polychoric correlation coefficient, this measure is based on the assumption that there is a continuous variable underlying each ordinal score. The generalised polychoric correlation coefficient, however, is more flexible and compatible with real data than the polychoric correlation coefficient because is computed assuming that the underlying variables are skew- normally distributed. The new coefficient, therefore, not only consents to measure the correlation between ordinal variables, but also to estimate the skewness of the related underlying continuous variables. The results of the application of the generalized polychoric correlation coefficient are shown and compared with those obtained by previous studies.

### 6a. Reliability, maintainability and safety case studies

**17. A Bayesian S-shaped Model for Software Defect Prediction during Testing**

*Authors:* Kevin McDaid (Dundalk Institute of Technology) and Stephen Keenaghan (Motorola Ireland Ltd), Ray Murphy (Motorola Ireland Ltd) *Keywords:* Software Reliability Growth Modelling, Defect prediction, Bayesian Methods, Nonhomogeneous Poisson processes *Format:* presentation (Reliability and safety) *Contact:* kevin.mcdaid@dkit.ie

The accurate prediction of software defects during system testing can be of significant benefit to firms seeking both to allocate their testing resources efficiently and to decide with confidence when to release the system to the next stage in the process. However, this activity can also be highly problematic with particular issues encountered early in the process when little defect information is available. One approach to overcoming this difficulty is to use Bayesian methods to combine the available defect data with the expert knowledge of key personnel. This paper develops a Bayesian version of the Yamada S-shaped software reliability growth model that allows for the incorporation of expert opinion in the prediction process. The original Yamada model is a nonhomogeneous Poisson process with an s-shaped mean value function. The prediction methodology is applied to real software testing defect data for a system recently developed by a large organization. Most importantly, in the collection of this original data set, the company recorded the start and finish times for each suite of software tests. This execution time data provides the ideal basis for the application of the proposed model. The work illustrates how to elicit the expert knowledge from the software personnel and explains how the methodology can be used to predict future defects and to determine when to terminate the system testing activity. The authors establish the practical benefits for the system in question and draw useful lessons regarding the general application of software reliability growth models in the software industry, particularly in situations where coverage-based rather than operational profile-based testing is used. Close attention is paid to the very real question as to whether the model is suitable when a limited amount of defect data is available during the early stages of testing. A methodology for the prediction of defect occurrence before testing commences based on prior information alone is also illustrated.

**19. IMPROVING The RELIABILITY And ROBUSTNESS Of the A MODULE, COMPONENT PART OF An ELECTRONIC SYSTEM OF COMMUNICATIONS FOR TRAINS**

*Authors:* Elisabeth Viles (Tecnun-University of Navarre) and David Puente (Tecnun), Maria JesÃƒÂºs Alvarez(Tecnun) *Keywords:* Reliability, quality improvement, six sigma *Format:* presentation (Reliability and safety) *Contact:* eviles@tecnun.es

This work is part of a research project that is being developed by Tecnun-University of Navarre and the Traintic S.L. and is being funded by the Basque Government. This research project's goal is to improve the reliability and robustness of an electronic system of communications for trains based on standard European TCN protocol. This work began in 2004 and it has been developed in two parts. The first part, was carried out during 2004 and studied the improvement of the reliability of the whole system during its operational phase. These results were presented at the First International Conference on Six Sigma that took place in Glasgow in December of 2004. Using the results from this work and taking into account the data below, we decided to focus on the reliability of the entire system by analyzing only one module:

Ã¢â‚¬Â¢\ In the last study, 69% of the failures due to the reliability which occurred in this module.

Ã¢â‚¬Â¢\ This situation resulted in 38% of the warranty's costs.

The goal set by the Traintic Management was to improve the reliability of the module by at least 8 times, without having to make modifications in the basic design and without having to make a large investment . The Six Sigma methodology, especially the stages of DMAIC cycle, was specifically chosen in order to achieve the proposed goal. Throughout the two months of work, we detected potential actions with the aim of significantly improving the module. At this moment we are planning the proofs that will allow us to check which is the best way to improve the reliability of the module while following the Management's instructions.

**32. Preservation of the IFR property for Markov Chain Imbeddable Systems**

*Authors:* Petros Maravelakis (Department of Statistics and Insurance Science, University of Pireaus) and M. V. Koutras, Department of Statistics and Insurance Science, University of Pireaus, Greece *Keywords:* Markov chain imbeddable systems, k-out-of-n: F system, IFR closure *Format:* presentation (Reliability and safety) *Contact:* maravel@stat-athens.aueb.gr

In the present article, we consider a class of reliability structures which can be efficiently described through a finite Markov chain (Markov chain imbeddable systems) and investigate its closeness with respect to the increasing failure rate (IFR) property. More specifically we derive a sufficient condition for system's lifetime to have increasing failure rate when the identical and independent components comprising it own this property. As an application of the general theory, we establish an alternative proof of the IFR closure property of the family of k-out-of-n systems.

**52. Statistical Optimization of Fatigue Tests in Mechanical Component Experimentation**

*Author:* Nikolaus Haselgruber (AVL List GmbH Graz) *Keywords:* Reliability, Accelerated Failure Time Model *Format:* presentation (Reliability and safety) *Contact:* nikolaus.haselgruber@avl.com

In the automotive industry, usually fatigue tests are applied to ensure the reliability and the durability of mechanical components. Standard practise is to employ a simple linear model for point estimators of the durability. Its multiplication by an empirical safety factor leads to an upper bound interpreted as confidence limit. An alternative approach is the application of an accelerated failure time model. It delivers an estimation of the durability with confidence bounds based on statistical theory. In the actual case, different error distributions have been investigated and confidence bounds are computed by simulations which are compared with approximations based on FisherÃ‚Â´s local information matrix. This method leads to less experimental effort as well as a better knowledge of the durability of such components.

**64. Statistical Certification of Software Systems**

*Authors:* Alessandro Di Bucchianico (Eindhoven University of Technology, Dept. of Mathematics) and K.M. van Hee (Eindhoven University of Technology), J.-F. Groote (Eindhoven University of Technology / CWI) *Keywords:* software reliability *Format:* presentation (Reliability and safety) *Contact:* a.d.bucchianico@tue.nl

Most papers on software release interpret release as an optimal stopping problem with a loss function that takes into account costs of extra testing and costs of undetected errors. This way of testing addresses the perspective of the software producer. We are interested in software consumers. Suppose software has been tested and we wish to certify with a certain confidence, that the software does not contain errors. The basic decision criterium is the number of error-free tests since the last error. An important feature is that we do not assume any knowledge on the initial numbers of errors in the software. We show how to implement our procedure when theta, the probability of detecting an error, is known and study its performance. Only if we have observed many similar systems during testing, we may have an estimate of theta. Often such an estimate is not available. Therefore we also study a generalization of this model where we assume in a Bayesian manner that theta is unknown but has a prior distribution. We also present a one-stage testing procedure with unknown detection probability, which may be used by a certification agency when their is already a test history from the producer of the software system. A fully Bayesian setting where the testing parameters are updated after each detected error is studied through simulation. Surprisingly and fortunately, this procedure has premature release probability that is not much higher than the one-stage testing case.

### 6b. Statistical modelling

**4. Submarine Diesel Generators and Statisticians: ne'er the twain shall meet.**

*Author:* Christopher McCollin (The Nottingham Trent University) *Keywords:* Repairable systems, Maintenance data, Exploratory Data Analysis *Format:* presentation (Data mining) *Contact:* Christopher.McCollin@ntu.ac.uk

The USS Halfbeak and USS Grampus failure time data have been analysed separately for the last 50 years (e.g. Lee 1980, Ascher and Feingold 1984, Crowder et al 1993). Both sets of data are from diesel generators on submarines mainly operating out the East coast of the United States. The analysis in this paper considers the times between maintenance actions as well as the performance characteristics of the generators, American history and previous analyses as well as information gleaned from various web sites. The data can be viewed as coming from two separate maintenance regimes: a common maintenance policy across the submarines based on crew turnaround and wartime or manoeuvres indicated by missing data. There is also a possible systematic reliability problem on one of the submarines. The paper concludes with highlighting problems with data collection and suggests methods for looking for latent information within data. Comments are given regarding the procedure for this type of analysis (repairable systems), possible improvements to software to aid analysis and reasons why this particular analysis have not been carried out before.

**91. Hybrid method for quantifying and analyzing Bayesian Belief Nets**

*Authors:* Anca Hanea (TUDelft) and P. van Leeuwen,Dr. Dorota Kurowicka,Prof. dr. Roger M. Cooke *Keywords:* Bayesian Belief Nets, Dependence modelling, Vines *Format:* presentation (Statistical modelling) *Contact:* a.hanea@ewi.tudelft.nl

Bayesian Belief Nets (BBNs) have become a very popular tool to specify high dimensional probabilistic models. Their popularity is based on the fact that influence diagrams capture engineer's intuitive understanding of complex systems. Commercial tools with an advanced graphical user interface that support BBNs construction and inference are available. Thus, building and working with BBNs is very efficient as long as one is not forced to quantify complex BBNs. A high assessment burden of discrete BBNs is often caused by the discretization of continuous random variables. Continuous BBNs were restricted to the joint normal distribution, until recently. In (Kurowicka, Cooke 2005) the `copula Ã¢â‚¬â€œ vine' approach to continuous BBNs is presented. This approach is quite general and allows traceable and defendable quantification methods, but it comes at a price: these BBNs must be evaluated by Monte Carlo simulation. Updating such a BBN will require resampling the whole structure. The advantages of fast updating algorithms for discrete BBNs are decisive. In this paper we combine the reduced assessment burden and modeling flexibility of the continuous BBN with the fast updating algorithms of discrete BBNs.

1.\ Quantify nodes of a BBN as continuous univariate random variables and arcs as conditional rank correlations;

2.\ Sample this structure;

3.\ Use the sample file in Netica to build conditional probability tables for a discretized version the BBN ;

4.\ Perform fast updating.

We will address some computational problems of this approach, as well as propose ways to solve them in some cases. We illustrate it with a practical example.

**96. Comparing a survey and a Conjoint Study**

*Authors:* Kim Pearce (ISRU) and Erik Mønness (Hedmark University College) and Shirley Coleman (ISRU) *Keywords:* questionnaire, conjoint analysis, survey analysis *Format:* presentation (Statistical consulting) *Contact:* k.f.pearce@ncl.ac.uk

As the name suggests, Conjoint Analysis makes it possible to consider several attributes jointly. It is a statistical method to analyse preference of selected issues. However, Conjoint Analysis requires a certain amount of effort by the respondent. The alternative is ordinary survey questions, taken one at a time. Survey questions are easier to grasp mentally, but one lacks the possibility to prioritize. We have carried out a survey where both methods are in action, making comparison possible. Four attributes are analyzed: A) Influencing Policy Agendas B) Communication with other organisations. C) Introducing innovative products/processes and/or services D) Organisation's Knowledge. The issues of importance, correlations and case clustering, are evaluated by both methods. Correspondence between how the two methods measure `the issue in question' is also given. The data originate from a study carried out during 2002-2005 to provide a thorough assessment of the contribution of Ã¢â‚¬Å“water intermediaryÃ¢â‚¬Â services, emerging between water users and the utilities operating water supply and wastewater disposal systems, to the more rapid implementation of key objectives of EU water policy. The study involved partners in 7 regions of Europe: Bulgaria (Sofia region), Denmark (Copenhagen region),Germany (Berlin region), Greece (Volos Metropolitan area), Hungary (Budapest region), and United Kingdom (Newcastle and Manchester regions). Water Intermediaries are defined as Ã¢â‚¬Å“organizations that act in-between the traditional relationships between utilities, regulators and consumers to enable the uptake of new technologies and changed social practices within the production-consumption relationship to reshape the intensity, timing and level of water use and wastewater productionÃ¢â‚¬Â.

**1. Implementing improvements**

*Authors:* jonathan smyth-renshaw (jsr & ass ltd) and Jonathan Smyth-Renshaw (JSR & Ass Ltd) *Keywords:* *Format:* presentation (Statistical consulting) *Contact:* smythrenshaw@btinternet.com

This presentation will focus on the implementation of improvement into a business. I believe this will be of interest based on my experience in Copenhagen in 2004. Picture the scene the hard work is done and analysis is completed and the bonus is due! The time has come to implement the improvement. In all cases, (I think) the improvement to ensure the `root cause' does not return will be one or a combination of the following:

Ã‚Â·\ Eliminate the step from the process, which contains the root cause.

Ã‚Â·\ Introduce a Poke Yoke solution

Ã‚Â·\ Introduction of a monitor and new/update your working methods/standards

Ã‚Â·\ Build in a reject rate and increased inspection

Ã‚Â·\ (Re design the process inputs using DoE (new project))

Ã‚Â·\ Decline any work using this process due to Ã‚Â£/$/E losses.

I plan to work through each of the above, discussing some of the statistical implications behind each improvement options. I will demonstrate using practical examples were appropriate.

### 6c. DOE and multi-objective optimization

**31. Industrial experiments: modelling split-plots and multiple responses**

*Authors:* Frøydis Bjerke (Matforsk) and ÃƒËœyvind Langsrud (Matforsk) and Are Halvor Aastveit (UMB - Norwegian University of Life Sciences) *Keywords:* Restricted randomisation, split.plot, multivariate analysis, 50-50 MANOVA, rotation tests, multi-stratum error structure, Type I error *Format:* presentation (Design of experiments) *Contact:* froydis.bjerke@matforsk.no

Two issues regarding the practical use of designed experiments are discussed; restrictions on randomisation and multiple responses. The former is typically related to hard-to-vary factors and natural sequences of factors, e.g. in separate stages of a process experiment. Randomisation restrictions should be taken into account in the construction of the design as well as in the statistical modelling. A case study of sausage production is presented, using a split-plot model with correlated multiple responses. Correlated responses and multiple testing are challenges that should be handled adequately. Both phenomena inflate the Type I error, that is, apparently significant effects are likely due to random noise. In the case study, the correlated multiple responses are handled by a newly developed method, the Ã¢â‚¬Â50-50 MANOVAÃ¢â‚¬Â (Langsrud (2002)). Having multiple responses, one may still perform a univariate analysis of each response. Rotation tests, as described by Langsrud (2005), are particularly useful as they adjust the p-values without being too conservative. Practical aspects of the planning, performing, response measurements and statistical analysis are emphasised throughout.

**53. The analysis of split-plot designs in practice**

*Authors:* Peter Goos (Universiteit Antwerpen) and Ivan Langhans (CQ Consultancy); Martina Vandebroek (Katholieke Universiteit Leuven) *Keywords:* containment method, generalized least squares, method of Kenward & Roger, mixed model analysis, Satterthwaite's method, residual method *Format:* presentation (Design of experiments) *Contact:* peter.goos@ua.ac.be

Many industrial response surface experiments are not conducted in a completely randomized fashion because some of the factors investigated in the experiment are not reset independently. Often, this is due to the use of batches in the experiment or the fact that some of the factors involved in the experiment are hard to change. The resulting experimental design then is of the split-plot type and the observations in the experiment are in many cases correlated. A proper analysis of the experimental data therefore is a mixed model analysis involving generalized least squares estimation. Many people, however, analyze the data as if the experiment was completely randomized, and estimate the model using ordinary least squares. The purpose of the presentation is to quantify the differences in conclusions reached from the two methods of analysis and to provide the audience with guidance for analyzing split-plot experiments in practice. It turns out that different options available in commercial software for determining the denominator degrees of freedom for significance tests in the mixed model analysis yield totally different results.

**55. Optimal two-level designs for conjoint studies**

*Authors:* Roselinde Kessels (Katholieke Universiteit Leuven) and Prof. Peter Goos (Department of Mathematics, Statistics & Actuarial Sciences, Universiteit Antwerpen, Belgium); Prof. Martina Vandebroek (Department of Applied Economics, Katholieke Universiteit Leuven, Belgium) *Keywords:* conjoint analysis; two-level designs; D-, A-, G-, V-optimality *Format:* presentation (Design of experiments) *Contact:* roselinde.kessels@econ.kuleuven.be

In business, conjoint studies are renowned for yielding a great deal of strategic insight as they allow managers to learn about customer interest in variations of the company's products or services. In a conjoint study, participants are asked to judge a set of profiles that are defined as combinations of the levels of a number of attributes of the product or service under investigation. In our talk, we focus on conjoint studies in which each of the participants receives a different set of profiles to rate. All attributes are dichotomous and the levels of one or more attributes of the profiles in each set are held constant to reduce task complexity. To enhance market realism, the profiles are usually administered in the form of prototypes that have to be tested first. For these conjoint studies, we present conditions for building optimal designs in terms of D-, A-, G- and V-efficiency departing from two-level factorial and Plackett-Burman designs. The conditions arrange the profiles into several groups that are each assigned to a different participant. We provide some examples to illustrate the practical value of the method.

**67. Implementing Asymptotically Optimal Experiments for Treatment Comparison**

*Authors:* Daniele Romano (University of Cagliari) and Alessandro Baldi Antognini and Alessandra Giovagnoli (University of Bologna, Italy) *Keywords:* Sequential design. Optimal design. Maximum likelihood. *Format:* presentation (Design of experiments) *Contact:* romano@dimeca.unica.it

Sequential experiments are widely used in biomedical practice but are also highly desirable in an industrial set-up. These procedures are very flexible since the experimenter can modify the trial as it goes along, on the basis of the previous allocations and/or observations. However, response-adaptive experiments present inferential problems because observations are no longer independent. When the experimental objectives can be defined as an optimization problem, often the optimal design depends on the unknown parameters of the statistical model. For example, in the case of a binary response trial for comparing two treatments T1 and T2 , with respective probabilities of success p1 and p2, if we are interested in inferring on their difference, then is the proportion of assignments of which minimizes the variance of the ML estimator and maximizes the power of the corresponding test. In general the maximum likelihood design for comparing v treatments is defined as the one based on the step-by-step updating of the target treatment allocation by ML estimates: this estimate is then used at each step for the randomized allocation of the treatments. It has been shown by Baldi Antognini and Giovagnoli (Sequential Analysis, 2005) that if the responses belong to the exponential family, for any optimality criterion the related ML design is asymptotically optimal and the MLE's of the parameters of interest retain the strong consistency and asymptotical normality properties, as if the observations were independent.. In this paper we give a computer program for implementing the ML design under some common models (binomial/normal/Poisson/exponential, etc.) and some most widely used optimal targets, and investigate its speed of convergence.

**99. Bayesian sample size determination**

*Authors:* Lourdes Rodero (Technical University of Catalonia (UPC)) and Josep Ginebra (UPC) *Keywords:* Bayesian methods, Expected Value of Sample Information, Exponential Family *Format:* presentation (Design of experiments) *Contact:* lourdes.rodero@upc.edu

Sample size determination is among the most commonly encountered tasks in statistical practice, and it is one of the most fundamental design of experiments questions. In it, one aims at choosing the smallest sample size that allows one to achieve a certain goal. In the frequentist approach, most often one seeks the smallest sample size that achieves a certain power at a specified significance level for testing simple versus simple hypothesis. In our presentation we explore the sample size determination in the context of Bayesian decision theory, and in particular, we consider it in the context of the estimations under quadratic loss for various distributions in the single parameter exponential family, which includes Normal location, Normal scale, Binomial, Poisson and Gamma. The extension to distributions in the multiple parameter exponential family and to decision problems other than squared error lost estimation will also be considered. Determining sample size is a very important issue in industrial statistics because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results.

### 6d. Six Sigma

**10. Lean Thinking and Six Sigma**

*Authors:* Ronald Does (IBIS UvA) and Henk de Koning (IBIS UvA) *Keywords:* Lean Principles, Six Sigma, Quality Improvement *Format:* presentation (Six Sigma and quality improvement) *Contact:* rjmmdoes@science.uva.nl

Both Lean Thinking and Six Sigma are approaches that currently receive a lot of attention in industry, services as well as in the academic world. Both have a long history. Lean Thinking essentially started just after World War II in Japan. Six Sigma is the culmination of a century long development of the quality management discipline. Both approaches moreover went through, to a certain degree, parallel developments in recent years. Originally started as approaches confined to a small range of industries (i.e. Lean Thinking in automotive and Six Sigma in electronics), their focus is broadening fast. A problem is that Six Sigma as well as Lean Thinking have been around for about as long as the typical Ã¢â‚¬Å“productÃ¢â‚¬Â life cycle in this business. So, it is necessary for their survival to change and develop. The first response to this circumstance is witnessed in the form of new developments within both approaches. The widening of the application area to service, business transaction and healthcare catches the eye most. In order to branch out, both approaches vigorously fight the legacy of sole industrial application reflected in their conceptual framework, toolkit and the underlying methods with some measure of success. A second way to deal with the challenge of becoming outdated is to combine the two approaches and to utilize the best of both worlds. In both industry and service some movement to combine the two has taken place. In this presentation Lean Thinking and Six Sigma will be separately explored. This will result in a clear picture of the weaknesses and strengths of the two approaches. After this a possible synthesis will be explored. It then will become apparent that both approaches, although in about the same stage of development, have strongly complementary virtues.

**26. 6 Sigma, Lean, 20 keys ; have you made your choice yet? Or....a litle bit of everything?**

*Authors:* Johan Batsleer (AMELIOR) and Jeroen Vanlerberghe (AMELIOR) *Keywords:* 6 Sigma ; lean ; 20 keys *Format:* presentation (Six Sigma and quality improvement) *Contact:* jb@amelior.be

Nowadays, companies have to strive for excellence if they hope to survive in the long run. Beating the competition is the most important prerequisite in order to make it through today's difficult market situation. There are plenty of models that help you achieve excellence. In this workshop we will shed light on some of them. Out of the US have been coming lots of success-stories about Six Sigma. The name sounds somewhat mysterious, but Six Sigma provides a uniform system aimed at reducing mistakes to an absolute minimum. For that purpose a strong methodology is used, known as DMAIC (Define, Measure, Analyse, Improve, Control). Companies such as Motorola and General Electric have spread Six Sigma around the globe. However, the success is mostly due to the fact that Six Sigma demands that one sets clear financial objectives. On the other hand, Six Sigma demands the creation of new roles within the company, people whose main assignment is to keep Six Sigma alive. These people are called Black Belts, Green Belts and Yellow Belts. Others seek refuge in the principles of Lean Manufacturing to reach the highest levels of excellence. The Toyota Production system, which is 50 years old, has become legendary and was the basis of Lean Manufacturing. Ã¢â‚¬Å“LeanÃ¢â‚¬Â is a structured way of eliminating losses. It is a system that deletes all unnecessary activities, which must lead to a leaner production. Managers worldwide are debating whether they should apply Six Sigma or Lean. And now, the Japanese are making the discussion even more complex with the introduction of yet another new methodology: the so-called Ã¢â‚¬Å“20 keysÃ¢â‚¬Â. 20 keys is an often used name for `The Practical Programme of Revolution in Factories' (in short `PPORF'). The 20 keys introduce the 20 areas in which a company has to improve in order to achieve shorter cycle times, better quality and lower cost. This workshop offers an in-depth comparison of the three systems. Another important question: Is it better to commit to one system or is it better to choose the best elements from each method and create a good mix?

**51. To Avoid the fallen of Icarus: setting up the Six Sigma projects.**

*Authors:* KamiL Torczewski, Michal Baranowicz (Wroclaw University of Technology) *Keywords:* Six Sigma project, guidelines *Format:* presentation (Six Sigma and quality improvement) *Contact:* k.torczewski@6sigma.pl

Quite often, many Six Sigma efforts related to selecting and defining Six Sigma projects look similar to the myth about the Icarus. Managers want to fly higher, to see more and to reach more... and they expect more. That is the case, especially during the first wave of the Six Sigma projects. Mistakes like wrong project selection, poor problem definition and poor goal statement are quite common and result in project failures. Such unfulfilled expectations usually cause a lot of frustration among managers and what is more important – among workers. Six Sigma implementation process faces growing resistance and results in implementation failure. The fallen of Icarus is happening. To stand up and attempt to implement the program again, second time, is much more difficult... and much more expensive. In this article the main guidelines, ideas and experiences, that can help to avoid the Icarus case are presented.

**43. Six Sigma, Statistics and Deming**

*Author:* Martin Gibson () *Keywords:* Six Sigma, Statistics, Deming *Format:* presentation (Six Sigma and quality improvement) *Contact:* gg1000@waitrose.com

Six Sigma originated in Motorola in the 1980's as an improvement methodology to compete against the best. Regarded as successful in the US it is now spreading across Europe and at the same time morphing into ever more improvement strategies - Lean Sigma, Innovative Design for Six Sigma, etc. In his 1994 book "The New Economics for industry, government and education", W. Edwards Deming ends by stating "Conformance to specifications, Zero Defects, Six Sigma Quality, and other nostrums, all miss the point (so stated by Donald J. Wheeler, 1992)". Is Deming right or wrong? In this presentation we will examine what Six Sigma is and is not offering for improvement and what methods (statistical and others) we must adopt if we truly want to improve.

**80. A rational reconstruction of Six Sigma's Breakthrough Cookbook**

*Authors:* Henk De Koning (IBIS UvA) and Jeroen de Mast (IBIS UvA) *Keywords:* reconstruction, Hidden factory model, Quality, DMAIC, Breakthrough Cookbook *Format:* presentation (Six Sigma and quality improvement) *Contact:* hkoning@science.uva.nl

Six Sigma is a programme that has gained wide acceptance in industry, but scientific understanding lags behind. A first step in a scientific study of Six Sigma's method is the provision of a crystal clear and consistent formulation. We aim to provide a clear picture of Six Sigma's method in terms of three elements, viz. steps, phases and purpose. This kind of explication of vaguely formulated knowledge is called rational reconstruction. In the presentation we shall critically review the motivations that are usually given in literature for the application of Six Sigma. Apart from anecdotal evidence in the form of showcases, these motivations point to improved cost structure, improved strategic position, or both. Next, descriptions in literature for the DMAIC phases are studied for consistency. Differences in interpretation of the five phases among authors are highlighted, with the aim to arrive at a more consistent and precise formulation of the quintessential ideas. Finally, the concepts and classifications of Six Sigma are also far from clear or agreed on. Therefore concepts and classification used to describe the purpose and method of Six Sigma are defined. The resulting formulation of Six Sigma's breakthrough methodology is intended to be precise and consistent enough to serve as a basis for scientific research.

### 7a. Workhsop: Advanced fields of statistical modelling

### 7b. Measurement processes and capability

**5. Measurement agreement**

*Authors:* Jan Engel (CQM) and Rajko Reijnen (CQM) *Keywords:* measurement methods; statistical modelling; statistical decision making *Format:* presentation (Statistical modelling) *Contact:* Engel@cqm.nl

Industry applies precise, but sometimes expensive measurements to judge product quality and process stability. It is often well appreciated that precise and expensive measurements are replaced by cheap and good-enough measurements, but what is good enough? An interesting example is the measurement of the concentration of analytes in chemical substances, where precise chemical measurements are replaced by less precise spectral measurements. The question that naturally arises is about measurement agreement: how well do two methods agree in their measurement results? Note that this question is usually not answered by a regression analysis or by computing a correlation coefficient. The equivalence of measurement methods may be well defined, however the deviation between them can exist in many ways. As a consequence, many measures for (dis)-agreement of measurement methods are reported in statistical literature. However, the value of such a measure to answer a specific industrial research question is not always clear. In our view we start with industrial research questions. A relevant question at the above example is to decide on product quality: is the product good or bad? The less precise spectral measurements should give (about) the same decision results as the precise chemical measurements, but what does this mean for the quality requirements on the spectra? We apply a flexible model for bivariate measurements with well interpretable parameters to translate research questions into requirements and hypotheses on model parameters. Finally, we test these hypotheses and find model parameter estimates. Two research questions will be worked out at the presentation.

**48. How to Handle Autocorrelation in Capability Analysis?**

*Authors:* Kerstin Vännman (LuleÃƒÂ¥ University of Technology) and Murat Kulachi *Keywords:* Capability index, autocorrelation, iterative skipping, hypothesis testing *Format:* presentation (Process modelling and control) *Contact:* kerstin.vannman@ltu.se

There is an increasing use of on-line data acquisition systems in industry. This leads to autocorrelated data and implies that the assumption of independent observations has to be abandoned. Most theoretical methods derived so far regarding capability analysis assume independence. A short overview of the area on capability and autocorrelation will be given. Then a new way to perform capability analysis when data are autocorrelated is presented. This method is based on what can be called the "iterative skipping" In that, by skipping a pre-determined number of observations, e.g. considering every 5th observation, the data set is divided into subsets for which the independence assumption may be valid. For each such subset of the data we calculate a capability index. The most widely used capability indices are considered. Under the assumption of normality the traditional tests can be used for each capability index based on the subset. By combining, in a suitable way, the infor! mation from each test based on the subset a new and efficient test procedure is obtained. We discuss different ways of combining the information from the individual tests and make comparisons based on power calculations. Examples will be presented to illustrate the new method.

**49. A GRAPHICAL APPROACH TO PROCESS CAPABILITY INDICES FOR ONE-SIDED SPECIFICATION LIMITS**

*Authors:* Malin Albing (LuleÃƒÂ¥ University of Technology) and Kerstin Vännman (LuleÃƒÂ¥ University of Technology) *Keywords:* capability index, process capability plot, one-sided specification interval, graphical method, hypothesis testing *Format:* presentation (Process modelling and control) *Contact:* malin.albing@math.ltu.se

Most of the published articles about process capability focus on the case when the speÃ‚Â¬ciÃ‚Â¬fication interÃ‚Â¬val is two-sided although one-sided specification inÃ‚Â¬terÃ‚Â¬vals are also used in industry. We extend the idea of process capability plots from the case of two-sided specification intervals to derive a graphical method to be used when doing capability analysis having one-sided specification limits. The derived process capability plots are based on existing capability indices for one-sided specification limits. Both the cases with and without a target value are investiÃ‚Â¬gated. Under the assumption of normality we sugÃ‚Â¬gest estimated process capabilÃ‚Â¬ity plots to be used to assess process capability at a given significance level. When an upper speÃ‚Â¬ciÃ‚Â¬fiÃ‚Â¬caÃ‚Â¬tion limit exists it is not uncommon that the smallest possible value of the characteristic is 0 and this also is the best value to obtain. This case can be hanÃ‚Â¬dled by the graphical method presented and will be disÃ‚Â¬cusÃ‚Â¬sed in! some detail. The preÃ‚Â¬sented graphical approach is at the same time helpÃ‚Â¬ful when tryÃ‚Â¬ing to understand if it is the vaÃ‚Â¬riaÃ‚Â¬biliÃ‚Â¬ty, the deviation from tarÃ‚Â¬get, or both that need to be reduced to improve the caÃ‚Â¬paÃ‚Â¬biÃ‚Â¬lity.

**81. Uncertainty in standard methods of corrosion evaluation**

*Author:* Evgeny Tartakovsky, Ph.D. (Dead Sea Bromine Group) *Keywords:* uncertainty; corrosion *Format:* presentation (Six Sigma and quality improvement) *Contact:* rostik@netvision.net.il

This study was targeted at the estimation of uncertainty in a widely accepted ASTM method for the corrosion rate evaluation and the elaboration of the decision rule for the acceptance of metallic materials as suitable construction materials. This method is based on duplicate analyses of the material under investigation. The rigorous GUM approach is of a little use for the uncertainty estimation in this case, since the corrosion experiments are extremely laborious. As an alternative, we chose the approach, recently proposed by Magnusson, Näykki, Hovind and Kryssel (NORDTEST, Technical report 537, 2003), which is based on the use of "historical data", readily available and, in this case, furnished by TAMI Ã¢â‚¬â€œInstitute for R&D (Israel). The expended uncertainty of the method was estimated as being equal to 28% in the domain of acceptance of a material as "non-corrodible". Such high value of uncertainty makes the decision about the use of material very problematic. Thinking about corrosion as a counterpart of a negative side-effect of medication, we propose complementing the approach of Magnusson and co-workers with the recognized non-inferiority tests, widely used in clinical trials.

**84. Future development of international standards for evaluating measurement uncertainty**

*Authors:* Alistair Forbes (National Physical Laboratory) and Peter Harris (NPL), Maurice Cox (NPL) *Keywords:* Measurement uncertainty, testing, SPC, metrology *Format:* presentation (Statistical modelling) *Contact:* alistair.forbes@npl.co.uk

Since the publication of the Guide to the Expression of Uncertainty in Measurement by the ISO in 1993 (the GUM), National Metrology Institutes such as the National Physical Laboratory have made a concerted effort to develop and implement methods for evaluating measurement uncertainty that conform to the main principles of the GUM. The GUM has been very successful in promoting a common understanding of concepts and harmonised procedures. While the main body of the GUM is likely to remain substantially unchanged for the forseeable future, further development of the GUM will be through supplements covering areas such as numerical methods for the propagation of distributions, evaluating uncertainty associated with vector valued quantities, modelling and least squares methods. There is also interest in recasting (or at least interpreting) the GUM in a Bayesian light. While the GUM is well known inside the metrology community, it is not so well known in communities involved in testing and SPC, for example, for which measurement uncertainty is also important. Standards such as ISO 5725, Accuracy (Trueness and Precision) of Measurement Systems and Results, use a terminology quite different from that of the GUM and there is a danger that there is a divergence of language if not of concept in these communities. In this paper, we discuss the future development of the GUM and look towards bringing more coherence in the future development of standards relating to measurement uncertainty relevant to a broad range of industrial activities. In this, we look for input from the statistical community to ensure that the standards embody sound statistical concepts and approaches.

### 7c. Six Sigma

**20. The Development of a New Scale for Measuring Customer Satisfaction**

*Authors:* Francesca Bassi (Universtiy of Padova) and Gianlugi Guido *Keywords:* consumption expereince, validity, relaibility, latent class models *Format:* presentation (Business and economics) *Contact:* francesca.bassi@unipd.it

Customer satisfaction is traditionally defined by means of the so-called Ã¢â‚¬Å“disconfirmation paradigmÃ¢â‚¬Â, as an evaluation emerging from the post-purchase comparison between product/service performance and customer expectations. This concept has been the topic of recent studies which argue that, although this paradigm must still be considered valid in its basic formulation, it should be extended as regards expectancies: i.e., expectations, which represent cognitive elements with a rational nature, should be considered together with desires, which represent motivational elements associated with personal objectives. Until now, however, the other term of comparison Ã¢â‚¬â€œ product performance Ã¢â‚¬â€œ has not yet been extended by considering the social, other than material, nature of consumption in affluent societies. The main changes to be considered regard the various stages of consumers' decision-making processes, and are related to: the new company orientation to Ã¢â‚¬Å“customers as productsÃ¢â‚¬Â; the salience of marketing stimuli capable of influencing consumers' expectations; and the increasing integration between products and services, which stimulate consumers' search for intangible elements which could add value to their products and provide consumer experience. This paper follows the research lines of the above-mentioned literature by proposing a different approach to customer satisfaction measurement. The nature of the concept is maintained as an evaluation deriving from a comparative process, but we change, or better, extend the terms to which expectations and desires are compared: from product performance alone to the entire consumption experience. The aim of the present work is to propose a scale to measure customer satisfaction with reference to product and services integrated, in a broader context than simply evaluating product performance, i.e., by measuring aspects involved in pre- and post-purchase stages. The proposed scale has three versions: for convenience, shopping, and specialty goods. The scale for shopping goods was also administered to a sample of buyers of a specific branded product (i.e., a pair of jeans) and evaluated for validity and reliability. Finally, latent class models are estimated in order to verify if there exist a sort of consumption experience phase satisfaction: Results show that instead satisfaction is a unique concept with statistically significant links with indicators in all phases of consumption experience.

**59. A Multi-Stream Process Capability Assessment Using A Non Conformity Ratio Based Desirability Index**

*Authors:* Ramzi Telmoudi (University of Dortmund) and Claus Weihs, Franz Hering *Keywords:* Process Capability Index, Desirability, multi-stream process *Format:* presentation (Six Sigma and quality improvement) *Contact:* telmoudi@statistik.uni-dortmund.de

In this paper the desirability index is used as a multivariate process capability index in order to assess the capability of a multi-stream process. The considered desirability index is the geometric mean of some non conformity ratio based desirability functions. A threshold for capability judgment is proposed. Furthermore, a condition under which the multivariate process capability index respects the "higher the better rule" for uncorrelated quality characteristics is proposed. The performance of the multivariate index is showed through a case study evolving a multi-stream screwing process.

**87. Modeling the adoption pattern of clusters of innovations**

*Authors:* Giuliana Battisti (Aston University) and M Colombo L Rabbiosi *Keywords:* Principal components, sample selection, diffusion model, HRMP *Format:* presentation (Business and economics) *Contact:* g.battisti@aston.ac.uk

resources management practices do play a major role in increasing the performance of firms (see Brynjolfsson and Hitt, 2000, Whittington et al. 1999, Caroli and Van Reenen 2001, Bresnahan et al. 2002, etc. These practices are such that can generate an overall positive net gain from joint adoption even when the net gain from the adoption of one innovation alone is negative (see for example Ichniowski et al. 1997, Whittington et al. 1999 for examples of super-additivity and clusters of innovations). In this paper we model the extent of use of Ã¢â‚¬Å“high performanceÃ¢â‚¬Â human resource management practices via the first principal component concerning the adoption of the following practices: formal team practices/quality circles, job rotation, individual and collective incentive schemes (HRMP). We then use an unbalanced panel data to identify the factors driving the decision to adopt and extensively use the innovations by the means of a dynamic specification corrected for the high presence of zeros (non- adopters) and the resultant sample selection.

**104. Lean manufacturing or Six Sigma is this simply East vs. West?**

*Authors:* Chris Angus (ISRU) and Colin Herron (School of Mechanical and Systems Engineering, University of Newcastle upon Tyne) *Keywords:* Six Sigma, Lean, Small Medium Enterprise *Format:* presentation (Six Sigma and quality improvement) *Contact:* c.m.angus@ncl.ac.uk

This research attempts to reconcile the methodologies of lean manufacturing with that of Six Sigma. It would be simplistic to accept this as East (Toyota) vs. the West (Motorola), however current evidence suggests that this may be the case. Adoption of Six Sigma within UK manufacturing appears to be predominantly within American corporations. Attempts have been made to introduce Six Sigma into regional Small Medium Enterprises (SMEs) but with limited success. Knowledge of and adoption of at least some of the tools and techniques of lean manufacturing appear to be more widespread. This paper is regionally based in the North East of England and focuses on the work of ISRU and the NorthEast Productivity Alliance (NEPA). It is important to understand the origins of each technique as well as areas of conflict or congruence. There are some established models/tools/techniques for organisational development, which have become accepted within industry and academia. The principle models are: Six Sigma, TPM, JIT, TQM and kaizen which all have a common goal of improving a process. The aim of the work is to identify the areas of compatibility if any and propose a holistic model of adoption. An evaluation to establish the understanding and application levels with regard to lean manufacturing and Six Sigma within North East manufacturing will be undertaken. This will include mapping the characteristics of the main tools in question to develop a model to support Six Sigma introduction and a literature review of success and failure with regard to lean manufacturing and Six Sigma.

**107. Stochastic simulation studies of dispatching rules for production scheduling in the capital goods industry**

*Author:* Christian Hicks (Newcastle University Business School) *Keywords:* dispatching rules; capital goods; *Format:* (Business and economics) *Contact:* Chris.Hicks@ncl.ac.uk

Research on dispatching rules has focused upon job shop situations or small assembly environments and ignored the impact of stochastic effects and other operational factors. This paper has investigated the use dispatching rules in stochastic situations using data obtained from a capital goods company that produced three families of complex product. The schedule spanned 18 months and involved the production of 56 products, from 3360 component with 5539 operations performed on 36 resources. The relative performance of dispatching rules and other operational parameters were investigated with a range of uncertainties. A sequential experimental approach was adopted. The first stage was a screening experiment that considered each factor at two levels. This aimed to establish the relationship between manufacturing performance (measured in terms of mean throughput efficiency and mean tardiness for each product family) and the level of operational factors and uncertainty. The second stage considered three levels of uncertainty with eight dispatching rules. Manufacturing performance deteriorated with increasing levels of uncertainty. The shortest operation time first rule generally produced the best results, particularly at product level.

### 8a. Algebraic statistics

Invited papers.**90. Bowker's test for symmetry and modifications within the algebraic framework**

*Authors:* Anne Krampe (University of Dortmund) and Sonja Kuhnt (University of Dortmund) *Keywords:* Bowker test, algebraic statistics *Format:* presentation () *Contact:* krampe@statistik.uni-dortmund.de

Categorical data can occur in a wide range of statistical applications. If the available data is observed in matched pairs, it is often of interest to examine the differences between the responses. We consider particular tests of axial symmetry in two-way tables. A commonly used procedure is the Bowker test which is a generalization of the McNemar test. The test decision is based on an approximation with the χ2- distribution which might not be adequate, for example if the table is sparse. Therefore modifications of the test statistic have been proposed. We suggest a test of symmetry based on Bowker's test and Monte Carlo Markov Chain methods following the algorithm of Diaconis and Sturmfels (1998). We carry out a simulation study to determine and compare the performance of the simulation test, the Bowker test and two modifications.

### 8b. Process models and testing

**34. EWMA control charts for monitoring normal variance**

*Author:* Sven Knoth (Advanced Mask Technology Center) *Keywords:* SPC;EWMA;change point;sigma;ARL;computation *Format:* presentation (Process modelling and control) *Contact:* Sven.Knoth@amtc-dresden.com

Control charts for monitoring normal variance are less widespread and analyzed than those employed for the normal mean. But, there are lots of application areas such as volatility monitoring in finance, surveillance of the measurement performance of high-end gauges etc. Meanwhile, their ARL properties could be determined. The talk will give some more insights in analyzing EWMA (exponentially weighted moving average) charts that are designed for detecting changes in normal variance.

**36. On an extension of the McNemar and Stuart-Maxwell tests for the comparison of matched multinomial probabilities**

*Authors:* Daniel Nel (Stellenbosch University) and Mike G. Kenward (London School of Hygiene and Tropical Medicine) *Keywords:* *Format:* presentation (Statistical modelling) *Contact:* dgnel@sun.ac.za

In the McNemar test the hypothesis of change in pre and post probabilities due to an intervention is tested on the same subjects when only two possible outcomes are possible in both the pre and post situations. The Stuart-Maxwell test extends this to the situation when three possible outcomes are possible in both a pre and post intervention situation. Successive McNemar tests can be used to determine the nature of the change detected in the Stuart-Maxwell test in a multiple comparisons setup. In this paper this idea is generalised to any number of possible outcomes in both the pre and post intervention situation. A log-linear model is proposed and rewritten in terms of an over parameterised logistic regression model with constraints to enable estimation of the parameters and testing the hypothesis of no change. A procedure is presented to detect where change occurred.

**93. Fault diagnosis in the on-line monitoring of a pasteurization process: a comparative study of different strategies.**

*Authors:* Santiago Vidal-Puig (Polytechnic University of Valencia) and Janssen P.M.A.;Sanchis J.(Department of Systems Engineering and Control of the Polytechnic University of Valencia ).; Ferrer, A. (Department of Statistics and Operations Research and Quality of the Polytechnic University of Valencia ). *Keywords:* Fault diagnosis ,T2 Hotelling , Contributions, Multivariate Statistical Process Control *Format:* presentation (Process modelling and control) *Contact:* svidalp@eio.upv.es

In this talk several strategies for fault diagnosis in the monitorization of multivariate processes are compared: strategies based on the space of the original variables and those based on projective methods in the latent variable space. The first group was specially developed for simultaneous monitorization of a few variables (4-6), usually related with the quality of processes and products. In our study, we will consider the approaches proposed by Doganaksoy et al. [1], Hawkins [2], and Mason et al. [3,4,5]. In the second group of strategies we will consider the contribution plot [6, 7] as a useful tool to diagnose the original variables responsible for the problem. Both kind of strategies were tested in a training plant PCT23: a scaled version of a real pasteurization process. The performance of both kind of strategies is compared during the two common phases involved in the monitorization of any process: Phase I deals with iterative model building and data purging of outliers in order to select the reference data set to fit the in-control process model; this reference model is used in Phase II to monitor new observations in order to detect any new problems that may appear in the process.

BibliografÃƒÂa

[1] Doganaksoy, N.; Faltin, F.W.; Tucker, W.T. (1991) Identification of out of control quality characteristics in a multivariate manufacturing environment. Communications in Statistics- Theory and Methods 20 (9), 2775-2790.

[2] Douglas M. Hawkins (1991) Multivariate Quality Control Based on Regression Ã¢â‚¬â€œAdjusted Variables. Technometrics 3 (1) 61-75

[3] Mason, R.L.; Tracy, N.D. ;Young, J.C. (1995) Decomposition of T2 for Multivariate Control Chart Interpretation. Journal of Quality Technology 27 (2), 99-108.

[4] Mason, R.L.; Tracy, N.D. ;Young, J.C. (1997) A practical approach for interpreting Multivariate T2 Control Chart signs. Journal of Quality Technology 29 (4), 396-406.

[5] Mason, R.L.; Tracy, N.D.;Young, J.C. (1999) Improving the sensitivity of the T2 Statistic in Multivariate Process Control. Journal of Quality Technology 31 (2), 155-165.

[6 Kourti, T. ; MacGregor, J.F. (1996) Multivariate SPC Methods for Process and Product Monitoring. Journal of Quality Technology 28 (4), 409-427.

**111. An application of acceptance sampling by variables in the presence of noisy observations**

*Authors:* Niels VÃƒÂ¦ver Hartvig (Novo Nordisk A/S) and Henrik Thoning (Novo Nordisk A/S) *Keywords:* Specifications, acceptance sampling by variables, measurement error, aerosol devices *Format:* presentation (Process modelling and control) *Contact:* nvha@novonordisk.com

We describe an application of acceptance sampling by variables under measurement uncertainty, developed for the production of aerosol devices for delivering drugs to the lungs. The release test for inhalation products is traditionally a test for coverage, ensuring sufficiently high delivered dose uniformity. The current FDA guideline for inhalation products [1] prescribes a non-parametric zero-tolerance test, but after recent consensus between FDA and the industry this will be replaced by a parametric tolerance interval test, which is essentially equivalent to acceptance sampling by variables [2,3]. As in-process control, we consider delta-filter weight (DFW) as an indirect method for measuring delivered dose. The method entails collecting the aerosol on a filter and determining the amount of drug deposited by pre and post-weighing the filter on a microbalance. The data will be noisy measurements of emitted dose, as both measurement error from the balance and day-to-day variation affect the precision. We illustrate how specifications for DFW may be determined, and how the day-to-day variation and measurement error affect these. The specifications are designed such that a device passing the in-process control will with a certain confidence also pass a final test based on emitted dose.

References:

[1]\ Metered Dose Inhaler (MDI) and Dry Powder Inhaler (DPI) Drug Products Chemistry, Manufacturing, and Controls Documentation, CDER/FDA, October 1998, (Docket No. 98D-0997) http://www.fda.gov/cder/guidance/2180dft.pdf.

[2]\ A Parametric Tolerance Interval Test for Improved Control of Delivered Dose Uniformity of Orally Inhaled and Nasal Drug Products. The International Pharmaceutical Aerosol Consortium on Regulation and Science (IPAC-RS), (2001) http://www.ipacrs.com/PDFs/IPAC-RS_DDU_Proposal.PDF

[3]\ Acceptance sampling by Variables under Measurement Uncertainty, Melgaard, H. and Thyregod, P., Frontiers in Statistical Quality Control 6, Lenz, H.-J. and Wilrich, P.-Th. (Eds), Physica Verlag, New York (2001).

**112. Waiting until the first occurrence of patterns in a bivariate sequence of trinomial trials**

*Authors:* Sotiris Bersimis (University of Piraeus, Department of Statistics and Insurance Science) and Balakrishnan, N. (Department of Mathematics & Statistics, McMaster University, Canada), Koutras, M. V. (Department of Statistics and Insurance Science, University of Piraeus, Greece) *Keywords:* Waiting Time; Markov Chain Embeddable Random Variables; Quality Control; Process Control; Acceptance Sampling; Multivariate Statistical Process Control; Grouped data; *Format:* presentation (Process modelling and control) *Contact:* sbersim@unipi.gr

In this article, we study the random variable (Rt) related to the waiting time until the first occurrence of a pattern (ei), which belongs to a set (F) of m patterns, in a bivariate sequence of trinomial trials. The study of the waiting time variable (Rt) is accomplished using the well known Markov chain embedding technique. The set (F) may be characterized as a set of stopping rules of the bivariate sequence. Sets of this form are met in the fields of statistical process control and acceptance sampling. For example, assume that, the final quality of the products of an industrial process is characterized by two correlated variables. Furthermore, assume that, the two quality characteristics may take the values: (0) for values of the variable close to the target, (2) for values of the variable beyond the acceptance limits, and (1) for values of the variable in an intermediate range. In that case, the acceptance or the rejection of a lot of products may be based in an appropriate acceptance sampling plan based on the random variable (Rt). An additional example of the use of the random variable (Rt) stems from the field of statistical process control using grouped data. It is quite common in industry to resort to grouped observations especially in cases where registering the exact value of the characteristic of interest is difficult or costly. However, all the research presented till today refers to the univariate case. It is widely acceptable that not only in industry but in other sectors as well, the simultaneous monitoring or control, of two or more correlated process characteristics is necessary. A new methodology for handling grouped data which arise from processes involving more than one correlated variables may be established using the random variable (Rt). In that case, the primary interest focuses on the identification of the appropriate set of patterns that will produce an effective scheme for diagnosing an out of control state.

### 8c. Workshop: Wild River DoE Workshop

### Poster presentations

**2. SPC for all**

*Authors:* jonathan smyth-renshaw (jsr & ass ltd) and Jonathan Smyth-Renshaw (JSR & Ass Ltd) *Keywords:* *Format:* poster *Contact:* smythrenshaw@btinternet.com

**3. Designed experiments in cyclic oxidation testing**

*Author:* Shirley Coleman (ISRU, Stephenson Centre) *Keywords:* latin square interactions outer array *Format:* poster (Design of experiments) *Contact:* shirley.coleman@ncl.ac.uk

**23. RELATIONSHIP BETWEEN A PULLUTANT AND SOME METEOROLOGICAL VARIABLES: A MULTIVARIATE ANALYSIS**

*Authors:* Letizia La Tona (University) and La Tona L., Mondello M., Gargano R. *Keywords:* pollutant, meteorological variables, PCA, CART *Format:* poster (Data mining) *Contact:* latona@unime.it

The study of relationships underlying voluminous data such as air pollution and meteorological records can provide important information regarding the nature of and trends in the data. The present study analyzes, by the use of multivariate methods, eight years data of NOx concentration recorded in seven air Ã¢â‚¬â€œ pollution monitoring stations of the industrial zone of Milazzo, (ME) Italy, related to meteorological variables such as relative humidity, temperature, solar radiation wind speed, wind direction and quantity of rain. Results of the statistical elaboration, principal component analysis and regression tree, show differences aspects about the relationships between pollutant and meteorological variables considered.

**24. Process Capability Indices: An Introductory Review**

*Author:* md anis (isi) *Keywords:* Specification limits; Non-conforming products; Sample size; Non-normal distribution; Auto-correlated data; Measurement error. *Format:* poster (Six Sigma and quality improvement) *Contact:* mohdzanis@yahoo.com

A review of the four basic process capability indices has been made. The inter-relationship between these indices has been highlighted. Attention has been drawn to their drawbacks. The relation of these indices to the proportion non-conforming has been dwelt upon and the requirement of the adequate sample size has been emphasized. Cautionary remarks on the use of these indices in the case of non-normal distributions, skewed distributions and auto-correlated data are also presented. The effect of measurement error on process capability indices has been highlighted.

**25. Least squares regression based on relative errors**

*Author:* Chris Tofallis (University of Hertfordshire) *Keywords:* least squares, regression *Format:* poster (Statistical modelling) *Contact:* soeqct@herts.ac.uk

The magnitude of an error only becomes meaningful when compared to the observed value i.e. a relative or percentage error is more meaningful than an absolute error. Yet regression, as usually applied, ignores this fact when aggregating errors: an error of two parts in 10 is treated as being equal to an error of two parts in a hundred. We derive and present the equations for regression coefficients based on percentage errors. This is done for both simple and multiple regression.

**39. Full time or Part time Ã¢â‚¬â€œ Six Sigma resource dedication dilemma.**

*Author:* KamiL Torczewski (Wroclaw University of Technology) *Keywords:* Six Sigma, black belt *Format:* poster (Six Sigma and quality improvement) *Contact:* k.torczewski@6sigma.pl

During the Six Sigma implementation process one of the biggest problems that appeared is to develop and effectively support the Six Sigma infrastructure. The problem especially relates to the main element of Six Sigma infrastructure that is a group of well trained Black Belts. The problem is not to train Black Belts but to give them one necessary resource Ã¢â‚¬â€œ time, that is crucial to successful completion of Six Sigma projects. The cause of the problem is a fact, that Black Belt candidates, are most often the best people within the organization and it is very difficult to give them 100% of their working time for Six Sigma projects. Because of that, many organizations are trying to use part-time Black Belts; with different results. That is why, the common question regularly appeared: is it absolutely necessary for organization to assign 100% of the Black Belt time for Six Sigma projects? In this article, author is trying to answer this question. The Black Belt role and duties are presented; the pros and cons are discussed.

**46. Combined Methods for Efficient Process Set-up and Optimization**

*Authors:* Roland Göbel (University of Dortmund, Institute of Forming Technology and Lighweight Construction) and Kleiner, Matthias (Univerity of Dortmund, Institute of Forming Technology and Lightweight Construction) *Keywords:* Incremental forming, Case-based reasoning, Design of experiments, Adaptive sequential optimization *Format:* poster (Design of experiments) *Contact:* roland.goebel@iul.uni-dortmund.de

Incremental forming processes like single-point incremental forming or different metal spinning processes are used to form complex geometries in prototyping or low volume production. Due to the high complexity and the large number of possible geometries to be formed, a systematic design of the processes is difficult and time consuming. To increase the efficiency of the process optimization, different methods have been purposefully combined. In a first step a knowledge-based forecast regarding the question where to start the optimization of the process is given by a case-based-reasoning approach (CBR). Thereby it is possible to fulfill the requirements of implementing implicit available background knowledge into the process set-up. Fundamental finite element models (FEM) of these processes are extremely time consuming, but special aspects of the processes can be simulated and have been considered by using CBR. Starting with a first prediction of useful parameter settings in ! the next step, a model based optimization using statistical design of experiments is performed (DoE). Doing this, for example categorical responses or mixed-level designs have to be considered and multivariate optimization has to be carried out. Furthermore, in the environment of very small stable regions in the parameter space like for example existing in sheet metal spinning conventional statistical methods are not applicable. Due to this, an adaptive sequential optimization procedure (ASOP) using space filling designs and spatial regression models has been developed to overcome these limitations. The application of these methods to incremental forming processes will be presented at some examples.

**57. The comparison of three types of data transformation to a designed experiment**

*Authors:* Matthew Linsley (ISRU) and E.Monness, I.E. Garzon *Keywords:* Transformations, Signal-Noise Ratio, Box- Cox, separation, location, dispersion *Format:* poster (Design of experiments) *Contact:* M.J.Linsley@ncl.ac.uk

This paper reviews the application of three forms of data transformation to help determine the optimum operating conditions for a milling machine with respect to surface finish. The data in this paper was obtained from a six-factor full factorial (26) Designed Experiment. The six factors Tool speed, Work piece speed, Depth of cut, Coolant, Direction of cut and Number of cuts were varied each with two levels. Each treatment was repeated 8 times so the mean and standard deviation of each treatment were available. In the original study, a 'smaller-the-better', signal-to-noise ratio (SNR) was applied to the experimental data. However the transformation did not take dispersion into account. An alternative method of transformation is introduced that successfully separates the process location from dispersion. Predictive models are selected that allow optimisation of both the mean and standard deviation responses. The Box-Cox method is then applied to determine a suitable data transformation. It is demonstrated that the application of this method allows variance stability across the fitted values for the predictive model for the mean values. A more efficient, resolution IV fractional factorial design is then produced and the results compared to those obtained from the full-factorial design. For the fractional design, the application of the Box-Cox method suggests the use of a different transformation.

**62. Minimum Risk Equivariant Estimator of the Generalized Variance**

*Authors:* Nahid Sanjari Farsipour (Shiraz University) and H. Zakrzadeh *Keywords:* Admissibility, equivariant,monotone likelihood ratio, improved estimator. *Format:* poster (Statistical modelling) *Contact:* nsf@susc.ac.ir

In this paper the best affine equivariant estimator of the generalized variance of a nonsingular normal distribution with unknown mean under the squared log error loss function is considered. Some improved estimators are derived.

**105. Statistical methods for comparing in vitro dissolution profiles of drug products**

*Authors:* Heli Rita (Orion Pharma) and Anni Liimatainen (Orion Pharma) *Keywords:* statistics, dissolution *Format:* poster (Statistical modelling) *Contact:* heli.rita@orionpharma.com

In vitro dissolution test is used to describe the release of an active ingredient into a solvent from a tablet or capsule drug product. The result of a test is presented as a cumulative series of proportion dissolved (%) of the amount of active ingredient. For each product, a dissolution test is developed during product development, by adjusting the method and the properties of the solvent. The test should be sufficiently discriminative to reflect relevant differences between formulations and changes in the manufacturing process and conditions. In vitro dissolution test results may be used as a surrogate to in vivo absorption, which is commonly measured in human bioequivalence studies. The similarity of in vitro dissolution profiles may be shown between formulations, instead of showing them equivalent in expensive bioequivalence studies. For comparing the dissolution profiles, a wide range of methods have been described. They can be classified into ANOVA-based, model-independent and model-dependent methods. The estimates reflecting the most relevant features of the profiles include mean dissolution time (MDT) and the rate of dissolution. The area under dissolution-time curve or its proportional form, dissolution efficiency (DE), is considered a proper indicator of both rate and extent of dissolution. Properties of different methods for comparing dissolution profiles and for showing their similarity were evaluated using data from a systematically designed experiment.

**44. The Analysis and Monitoring of Linear Profiles**

*Authors:* Stelios Psarakis (Athens University of Economis and Business, Dept of Statistics) and J. Koulouriotis *Keywords:* Process control, Linear profiles *Format:* poster (Process modelling and control) *Contact:* psarakis@aueb.gr

In 1924, Shewhart presented the technique of SPC (statistical process control) and gave to the industries the technical tools for checking and testing the quality of a product or a process with main target the quality improvement. There are many practical situations in which the quality is efficiently characterized by a relationship between a response variable and one or more explanatory variables. This relationship can be represented by a profile (curve). In some applications the profile can be very well represented by a simple linear regression model, but many times more complicated models are demanded. In recent literature there are studies in monitoring processes characterized by simple linear regression profiles. These studies analyse the Phase I and Phase II procedures for linear profiles. In this paper we give an overview of these studies. Specifically, we examine the methods for Phase I and Phase II analysis in linear profiles. We also propose a new method for Phase II analysis in case of a general linear profile and we examine its performance using simulation. This method can be sufficiently applied also in the case of non-linear models.

**58. Experiences on the impact of data uncertainties and user choices on factorial and Taguchi experiments**

*Authors:* Matthew Linsley (ISRU) and MJ Linsley, IE Garzon, J Otte, PJ Wright and A Anderson *Keywords:* Design of Experiments *Format:* poster (Design of experiments) *Contact:* M.J.Linsley@ncl.ac.uk

When undertaking a Taguchi or other designed experiment, there are many practical issues in terms of both application and interpretation. Some of these are explored by examining a series of studies intended to improve the surface finish quality of milled aluminium components. Three issues are explored: (i) What is the impact of the normal experimental uncertainty involved in physically measuring a quality response? It is shown, firstly, that poor measurement certainly gives meaningless outcomes (hardly a surprise) but, secondly, that even very careful measurement has intrinsic measurement uncertainties that may affect the outcome of subsequent statistical analysis. (ii) What impact does the structure of the experiment design have? Does it affect the outcomes if slightly different factors and levels are used (especially with the common, because economical, two-level designs)? It is shown that the statistical outcomes have the expected broad similarities, but also specific significant differences. (iii) If a Taguchi fractional experiment is chosen, does it matter if, perhaps through initial ignorance or misinformation about the process being studied, the "wrong" factor allocation to columns is made? It is shown that the statistical outcomes are different, but not as greatly so as might be feared, suggesting some robustness in the Taguchi experiment array concept. The conclusion is that broad confidence can be had in these methods from a practical point of view, but this must be tempered by an appreciation of the impact on them of both intrinsic measurement uncertainties and user choices in applying them.

**89. Detection of outliers in multivariate and structured data sets**

*Author:* Sonja Kuhnt (Department of Statistics, University of Dortmund) *Keywords:* Outlier, outlier identification, multivariate data *Format:* poster (Statistical modelling) *Contact:* kuhnt@statistik.uni-dortmund.de

Observations which deviate strongly from the main part of the data can occur in every statistical analysis. These observations, usually labelled as outliers, may cause completely misleading results when using standard methods, but can also contain information about special events or dependencies. Although the problem how to identify and to handle outliers has been subject of numerous investigations, no globally accepted definition of outliers exists. A formal definition of outliers for most data and model situations can be derived from the concept of so-called alpha-outliers. We discuss cases of multivariate and structured data situations, where a model from the class of generalized linear models or graphical models is considered. Generalized linear models essentially extend the classical linear model in two ways: data are not necessarily assumed to be normally distributed and the mean is not necessarily modelled as a linear combination of certain covariates but some function of the mean is. Graphical models combine multivariate statistical models with a representation of conditional independence properties by a mathematical graph. For all considered model situations one-step outlier identification procedures based on robust estimators are presented. For a data example of annoyance resulting from multiple transportation noise we discuss the use of outlier identification in the model building process.

**92. Comparing different approaches for on-line batch process monitoring: application to wastewater treatment**

*Authors:* Nuria Portillo-Poblador (Universidad Politécnica de Valencia) and Petra Janssen(Department of Industrial and Applied Mathematics. Technical University of Eindhoven (The Netherlands)), Daniel Aguado(Department of Hydraulic Engineering and Environment, Technical University of Valencia (Spain)) and Alberto Ferrer(Departmen *Keywords:* Fault detection, Multivariate statistical batch process control (Batch MSPC), Multiway principal component analysis (MPCA), Multiway partial least squares (MPLS), Wastewater *Format:* poster (Process modelling and control) *Contact:* nportillo@eio.upv.es

Typical batch process data consist of several variables measured over the duration of the batch. These data can be arranged in a three-way matrix (X), which consist of three modes or variability directions: batch, variable and time. Several methods for on-line batch process monitoring have been proposed. Nomikos and MacGregor (1995) developed a methodology based on unfolding the three-way matrix into a two-way structure, preserving batch mode and unfolding the variable and time modes. For on-line monitoring it is necessary to estimate or Ã¢â‚¬Å“fill inÃ¢â‚¬Â the unknown future observations of the new batch. Several procedures can be used: filling in with zero deviation from the mean trajectory of each variable, filling in with the current deviation from the mean trajectory of each variable, and estimating the unknown future trajectories using the PCA model (missing data option). Recently, another method to fill in the unknown future observations has been proposed (Cho and Kim, 2003). In this case, the batches from the historical data set which are most similar to the monitored batch are used to fill in the future observations. On the other hand, there are several approaches in which the filling in procedure is not required. One of them is the Wold et al. (1998) approach in which variable direction is maintained while batch and time modes are unfolded. Rännar et al. (1998) proposed an adaptive, recursive approach to update multi-block (hierarchical) PCA/PLS models. Data from a laboratory sequencing batch reactor (SBR) for wastewater treatment will be used to compare the above multivariate statistical batch process control techniques in terms of their effectiveness for monitoring and fault diagnosis.

**97. A Probabilistic-Fuzzy Approach for TQSEM**

*Authors:* Grigore Albeanu (Unesco IT Chair, University of Oradea) and GRIGORE ALBEANU (UNESCO IT Chair, Univ. Oradea), FLORIN POPENTIU-VLADICESCU (UNESCO IT Chair, Univ. Oradea), POUL THYREGOD (Technical University of Denmark, IMM), HENRIK MADSEN (Technical University of Denmark, IMM) *Keywords:* Software engineering management, CMM, Bayesian analysis, probabilistic fuzzy *Format:* presentation (Process modelling and control) *Contact:* galbeanu@fmi.unibuc.ro

Software development, as well as, general project management presents both an opportunity and a threat. In order to obtain good software projects a methodology based on TQSEM (Total Quality for Software Engineering Management) it is necessary to be applied. To measure the capability of some software organization or a research team a standard methodology have to be used. There are a lot of approaches in assessing the capability of such a team. This contribution considers the CMM approach and applies the Maturity Questionnaire developed by SEI (Software Engineering Institute, Carnegie Mellon University) and extended with fuzzy linguistic variables. Both a deterministic and probabilistic-fuzzy strategy it is used used to establish the maturity level. A probabilistic approach, mainly based on Bayesian analysis and, a fuzzy approach based on the minimum information principle (MIP) are given in the main section. Also an analysis concerning the development of the software project PoLogCem, designed under NATO-STI Collaborative Linkage Grant EST.CLG.979542 it is provided.

**113. How to fit professional development into our busy lives: Continuing education voucher systems**

*Author:* Helle Rootzén (Informatics and Mathematical Modelling,Technical University of Denmark) *Keywords:* Learning objects, blended learning, statistics, research-based continuing education *Format:* poster () *Contact:* hero@imm.dtu.dk

Learning should be fun as well as inspiring and innovative. What you learn should be directly applicable to your daily work and should help you see things in a new perspective, and your studies should fit into a busy life. The world around us changes so fast that life-long learning is a prerequisite for possessing the competencies demanded by the business sector. Today, data analysis is used in practically all areas of society and plays an important role in almost any company. Many employees find it important to be familiar with data analysis and able to apply statistical methods Ã¢â‚¬â€œ competencies that will increase the quality of their company and save it considerable expense. So we need a new type of continuing education that will reflect a rethinking of content, form and duration. In the future, continuing education will be in the form of voucher systems. You may attend the specific chunk of a study programme you require whenever it suits you and pay only for what you get. We have proposed a new type of research-based continuing education courses in statistics. These courses are structured around `learning objects', i.e. short complete education sessions, which may be combined in various ways according to the students' interests and levels. We combine them with `blended learning', i.e. a combination of e-learning, web-based learning and face-to-face learning.