ENBIS-13 in Ankara15 – 19 September 2013 Abstract submission: 5 February – 5 June 2013
The following abstracts have been accepted for this event:
A Data Mining Approach for Yield Loss' causes Identification in Semi-conductor Industry
Authors: Hasna Barkia (Ecole des Mines de Saint Etienne), Xavier Boucher (Ecole des Mines de Saint Etienne)
Primary area of focus / application: Mining
Keywords: Data mining, Yield loss causes identification, Semi-conductor industry, Clustering, Association rules mining
Submitted at 16-Apr-2013 10:22 by BARKIA Hasna
To identify yield loss’ causes for a product "X", we propose a three step approach that gives as a result a set of relational patterns, corresponding to potential yield loss’ causes. These patterns are generated by data mining algorithms, as associations among descriptive clusters which characterize each of the useful databases considered.
First, the context identification: Using the engineering knowledge, we specify a set of the most critical production steps for the product "X" wafers. Let "Mi" be one of these sets, a succession of production "Pi" and quality inspection "Qi" steps.
Second, the data analysis divided in two steps:
First: cluster identification.
From the dataset "dataPi", corresponding to the equipment data collected at a specific process step, we identify clusters corresponding to different operation modes of tools on this step. Besides, "dataQi" is used to identify clusters corresponding to different wafer’ quality status. We propose Clustering algorithms to identify these clusters.
Second: relational patterns identification between the clusters identified previously.
We propose to use association rules algorithms to identify relations as "cap->dbq" between a production step’ cluster "cap" and a quality measure step’ cluster "dbq".
The last step is interpretation: Starting from the set of association rules identified previously and with the help of specialized engineers, we can identify which of the clusters results "dbq" corresponds to a potential yield loss situation, and identify the different causes "cap" that lead to this cluster "dbq".
Multi-Objective Optimization based on Multivariate Mean-Covariance Models
Authors: Nikolaus Rudak (TU Dortmund University), Sonja Kuhnt (TU Dortmund University)
Primary area of focus / application: Design and analysis of experiments
Keywords: Multiple responses, Optimization, Robust parameter design, Double generalized linear model
Submitted at 16-Apr-2013 15:11 by Nikolaus Rudak
Pourahmadi, M. (1999), "Joint mean-covariance models with applications to longitudinal data: unconstrained reparametrisation", Biometrika, 86: 677-690.
Pourahmadi, M. (2000), "Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix", Biometrika, 87: 425-435.
Kuhnt, S. and Erdbruegge, M. (2004), "A strategy of robust paramater design for multiple responses", Statistical Modelling, 4: 249-264.
Erdbrügge, M., Kuhnt, S., Rudak, N. (2011), "Joint optimization of independent multiple responses based on loss functions", Quality and Reliability Engineering International, 27: 689–703. doi: 10.1002/qre.1229
Hoff, P.D. and Niu, X. (2012), "A Covariance Regression Model", Statistica Sinica, 22: 729-753.
Kuhnt, S. and Rudak, N. (2013), "Simultaneous Optimization of Multiple Responses
with the R-Package JOP", to appear in Journal of Statistical Software.
Nonparametric Bayesian Estimation of a POD Function
Authors: Merlin Keller (EDF)
Primary area of focus / application: Other: French special session
Keywords: Structural reliability, Uncertainty analysis, POD function, Bayesian estimation, Dirichlet process mixture model
Submitted at 17-Apr-2013 14:05 by Merlin Keller
In this structural reliability context, a key input parameter of a steel component physical model is the distribution of non-evolutive conception flaw sizes.
This distribution can be estimated from a mixture of measures from destructive lab experiments, seen as a perfect sample of the target distribution, and measures from periodic in-service inspections, plagued by noise and limited numerical precision. Moreover, during these inspections, any given flaw may be detected or not, with a probability depending on its size. This crucial characteristic of the measurement process is modeled as an increasing function of the flaw size, known as the probability of detection (POD).
POD function are often modeled assuming a parametric form, such as a log-normal or logistic CDF. However, we show that in certain cases such models are invalidated by the data at hand, and propose instead a much more flexible way of estimating the POD function, which does not assume any predefined shape. We then demonstrate the impact of POD modeling on the ensuing reliability analysis, on simulated as well as actual datasets.
Big Data: How Tourism and Other Industries Might Benefit
Authors: Andrea Ahlemeyer-Stubbe
Primary area of focus / application: Business
Keywords: Data Mining, Big Data, Predictive Modelling, Data Integration
Submitted at 17-Apr-2013 17:40 by Andrea Ahlemeyer-Stubbe
These data pools can be used to obtain a higher quality of information compared to the data obtained from old fashioned booking and communication processes. This is where BIG DATA comes in.
Apart from data storage you will need data mining techniques. It is possible to "mine" data pools for "hidden" insights and information. These insights give you a major competitive advantage over your competitors to attract new customers and to turn them into frequent guest and loyalists.
Insights paired with individualized and data driven communication are crucial for the future development of the tourism sector.
Especially in sales and marketing the knowledge about future customer behavior as well as customer intentions/ desires will make a difference between failure and success for any campaign.
Parallel Optimization of a Sheet Metal Forming Process based on FANOVA Graph Decomposition
Authors: Momchil Ivanov (TU Dortmund University)
Primary area of focus / application: Design and analysis of experiments
Keywords: parallel black-box optimization, EGO, metamodels, FANOVA graph decomposition, sheet metal forming
Submitted at 18-Apr-2013 15:04 by Momchil Ivanov
In this presentation an elegant way to produce a parallel optimization procedure based on FANOVA graph, a technique from the sensitivity analysis toolbox, is presented. The FANOVA graph procedure is able to detect the additive structure of a function. Given such an additive interaction structure, which is often the case in real applications, the additive blocks can be simultaneously optimized independent of each other, e.g. using EGO. This allows the experimenter to use several simulators and to perform a parallel optimization.
This talk presents the application of the described parallelization technique and discusses problems that may arise in the process. Different models, such as kernel interpolation are also discussed as a basis for metamodel-driven optimization. In conclusion the procedure is tested on a sheet metal forming simulation. Sheet metal forming is used for the production of automobile body parts. Unfortunately, defects such as tearing, wrinkling or spring back are frequently observed. It is shown that the new procedure allows for an efficient process optimization, leading to reduced sheet tearing.
Using Conditional Linearity in Nonlinear Regression Analysis for Short-Term Prediction of High Water Levels
Authors: Alessandro Di Bucchianico (Eindhoven University of Technology), Jan-Rolf Hendriks (Dutch Ministry of Transport, Public Works and Water Management), Krijn Saman (Dutch Ministry of Transport, Public Works and Water Management)
Primary area of focus / application: Modelling
Keywords: short-term prediction, water level, nonlinear regression, conditional linearity, singular gradient, variable projection algorithm
Submitted at 19-Apr-2013 09:21 by Alessandro Di Bucchianico
protection system consisting of dikes and dams. An important part of the Delta Plan is the Easter Scheldt Storm Surge Barrier, which is a bridge-like
construction that can be closed when water levels become too high.
When water levels are above 275 cm a Decision Team is physically present at the barrier to close the barrier at the right moment. In order to keep
flooding risk at acceptable levels, the barrier automatically closes and takes over full control when there is a water level prediction of 300 cm
exactly. In order to avoid environmental damage, the barrier is not allowed to close at predicted water levels below 300 cm. It is thus important for
the Decision Team to avoid unnecessary closures of the barrier as well avoid that the barrier takes over full control. The Decision Team has access to several long-term water level predictions. As an addition to these existing predictions, we are developing short-time prediction models. Part of these short-time prediction models is a nonlinear regression model with a quadratic polynomial to model the trend and various sine waves to model oscillations. Straightforward use of nonlinear regression to water levels measured every 10 seconds during the past 30 minutes surprisingly failed because of singular gradients. Appropriate scaling of the variables involved had a clear effect but was not sufficient to overcome the numerical problems. We will show how using the fact that the model has conditionally linear parameters solves all numerical problems. Implementations of the so-called variable projection algorithm of Golub and Pereyra in R and Matlab will be presented in order to illustrate how to use conditional linearity in practice.