ENBIS-13 in Ankara

15 – 19 September 2013 Abstract submission: 5 February – 5 June 2013

My abstracts


The following abstracts have been accepted for this event:

  • A Data Mining Approach for Yield Loss' causes Identification in Semi-conductor Industry

    Authors: Hasna Barkia (Ecole des Mines de Saint Etienne), Xavier Boucher (Ecole des Mines de Saint Etienne)
    Primary area of focus / application: Mining
    Keywords: Data mining, Yield loss causes identification, Semi-conductor industry, Clustering, Association rules mining
    Submitted at 16-Apr-2013 10:22 by BARKIA Hasna
    17-Sep-2013 18:10 A Data Mining Approach for Yield Loss' causes Identification in Semi-conductor Industry
    Semiconductor production cycle is a combination of production and quality inspection steps. The collected data at these steps lead to huge amounts of data stored in heterogeneous databases.
    To identify yield loss’ causes for a product "X", we propose a three step approach that gives as a result a set of relational patterns, corresponding to potential yield loss’ causes. These patterns are generated by data mining algorithms, as associations among descriptive clusters which characterize each of the useful databases considered.

    First, the context identification: Using the engineering knowledge, we specify a set of the most critical production steps for the product "X" wafers. Let "Mi" be one of these sets, a succession of production "Pi" and quality inspection "Qi" steps.

    Second, the data analysis divided in two steps:
    First: cluster identification.
    From the dataset "dataPi", corresponding to the equipment data collected at a specific process step, we identify clusters corresponding to different operation modes of tools on this step. Besides, "dataQi" is used to identify clusters corresponding to different wafer’ quality status. We propose Clustering algorithms to identify these clusters.
    Second: relational patterns identification between the clusters identified previously.
    We propose to use association rules algorithms to identify relations as "cap->dbq" between a production step’ cluster "cap" and a quality measure step’ cluster "dbq".

    The last step is interpretation: Starting from the set of association rules identified previously and with the help of specialized engineers, we can identify which of the clusters results "dbq" corresponds to a potential yield loss situation, and identify the different causes "cap" that lead to this cluster "dbq".
  • Multi-Objective Optimization based on Multivariate Mean-Covariance Models

    Authors: Nikolaus Rudak (TU Dortmund University), Sonja Kuhnt (TU Dortmund University)
    Primary area of focus / application: Design and analysis of experiments
    Keywords: Multiple responses, Optimization, Robust parameter design, Double generalized linear model
    Submitted at 16-Apr-2013 15:11 by Nikolaus Rudak
    17-Sep-2013 16:15 Multi-Objective Optimization based on Multivariate Mean-Covariance Models
    Many technical applications include more than one quality characteristic and lead therefore to multiresponse optimization problems where the aim is to get the mean on target while minimizing the variance. Such problems can be tackled with the Joint Optimization Plot (JOP) method, introduced by Kuhnt and Erdbrügge (2004), where a risk function is minimized for a whole sequence of cost matrices. Results are displayed by the Joint Optimization Plot. This method is implemented in the R-package JOP (Kuhnt and Rudak (2013). So far, applications are mainly based on separate models for each response and use of diagonal cost matrices where the JOP method leads to Pareto Optimal settings, as shown in Erdbrügge, Kuhnt and Rudak (2011). However, independence of responses cannot always be assumed. Thus, we make use of joint mean-covariance models where the covariance matrix is allowed to depend on covariates (Pourahmadi (1999, 2000), Hoff and Niu (2012)) and apply these models to a recent data set arising from a thermal spraying process.


    Pourahmadi, M. (1999), "Joint mean-covariance models with applications to longitudinal data: unconstrained reparametrisation", Biometrika, 86: 677-690.

    Pourahmadi, M. (2000), "Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix", Biometrika, 87: 425-435.

    Kuhnt, S. and Erdbruegge, M. (2004), "A strategy of robust paramater design for multiple responses", Statistical Modelling, 4: 249-264.

    Erdbrügge, M., Kuhnt, S., Rudak, N. (2011), "Joint optimization of independent multiple responses based on loss functions", Quality and Reliability Engineering International, 27: 689–703. doi: 10.1002/qre.1229

    Hoff, P.D. and Niu, X. (2012), "A Covariance Regression Model", Statistica Sinica, 22: 729-753.

    Kuhnt, S. and Rudak, N. (2013), "Simultaneous Optimization of Multiple Responses
    with the R-Package JOP", to appear in Journal of Statistical Software.
  • Nonparametric Bayesian Estimation of a POD Function

    Authors: Merlin Keller (EDF)
    Primary area of focus / application: Other: French special session
    Keywords: Structural reliability, Uncertainty analysis, POD function, Bayesian estimation, Dirichlet process mixture model
    Submitted at 17-Apr-2013 14:05 by Merlin Keller
    17-Sep-2013 17:30 Nonparametric Bayesian Estimation of a POD Function
    Many complex industrial systems have high safety requirements such that no failure is observed during their lifetime. Nevertheless, companies have to periodically justify their resistance to several extreme scenarios. In this context, the behaviour of certain critical components is modelled and simulated using complex numerical codes. The system’s reliability properties are then obtained by propagating uncertainties from the input variables to the output quantity of interest.
    In this structural reliability context, a key input parameter of a steel component physical model is the distribution of non-evolutive conception flaw sizes.
    This distribution can be estimated from a mixture of measures from destructive lab experiments, seen as a perfect sample of the target distribution, and measures from periodic in-service inspections, plagued by noise and limited numerical precision. Moreover, during these inspections, any given flaw may be detected or not, with a probability depending on its size. This crucial characteristic of the measurement process is modeled as an increasing function of the flaw size, known as the probability of detection (POD).
    POD function are often modeled assuming a parametric form, such as a log-normal or logistic CDF. However, we show that in certain cases such models are invalidated by the data at hand, and propose instead a much more flexible way of estimating the POD function, which does not assume any predefined shape. We then demonstrate the impact of POD modeling on the ensuing reliability analysis, on simulated as well as actual datasets.
  • Big Data: How Tourism and Other Industries Might Benefit

    Authors: Andrea Ahlemeyer-Stubbe
    Primary area of focus / application: Business
    Keywords: Data Mining, Big Data, Predictive Modelling, Data Integration
    Submitted at 17-Apr-2013 17:40 by Andrea Ahlemeyer-Stubbe
    16-Sep-2013 11:00 Big Data: How Tourism and Other Industries Might Benefit
    The increasing use of technology in many industries especially in areas of the travel & tourism sector enables you to collect digital data in a very fast and efficient way. In fact, it allows you to be a constant aid for all travel-related issues, fasten processes, balance your utilization and establish a digital dialogue with your guests. You will be able learn if a guest is likely to become a brand ambassador or defector. All of these technologies generate petabyte of data that should be used to reach your business objectives.

    These data pools can be used to obtain a higher quality of information compared to the data obtained from old fashioned booking and communication processes. This is where BIG DATA comes in.

    Apart from data storage you will need data mining techniques. It is possible to "mine" data pools for "hidden" insights and information. These insights give you a major competitive advantage over your competitors to attract new customers and to turn them into frequent guest and loyalists.
    Insights paired with individualized and data driven communication are crucial for the future development of the tourism sector.

    Especially in sales and marketing the knowledge about future customer behavior as well as customer intentions/ desires will make a difference between failure and success for any campaign.
  • Parallel Optimization of a Sheet Metal Forming Process based on FANOVA Graph Decomposition

    Authors: Momchil Ivanov (TU Dortmund University)
    Primary area of focus / application: Design and analysis of experiments
    Keywords: parallel black-box optimization, EGO, metamodels, FANOVA graph decomposition, sheet metal forming
    Submitted at 18-Apr-2013 15:04 by Momchil Ivanov
    16-Sep-2013 11:20 Parallel Optimization of a Sheet Metal Forming Process based on FANOVA Graph Decomposition
    The modeling and optimization of expensive black-box functions is often performed with the help of metamodel-based sequential strategies. A popular choice is the efficient global optimization (EGO) algorithm, which is based on the Kriging metamodel. Although the EGO method is known for its efficiency, a big limitation of this procedure is the fact that it allows only one simulation at a time. Being able to take advantage of several simulators simultaneously, and thus parallelizing the optimization, is a very appealing idea.
    In this presentation an elegant way to produce a parallel optimization procedure based on FANOVA graph, a technique from the sensitivity analysis toolbox, is presented. The FANOVA graph procedure is able to detect the additive structure of a function. Given such an additive interaction structure, which is often the case in real applications, the additive blocks can be simultaneously optimized independent of each other, e.g. using EGO. This allows the experimenter to use several simulators and to perform a parallel optimization.
    This talk presents the application of the described parallelization technique and discusses problems that may arise in the process. Different models, such as kernel interpolation are also discussed as a basis for metamodel-driven optimization. In conclusion the procedure is tested on a sheet metal forming simulation. Sheet metal forming is used for the production of automobile body parts. Unfortunately, defects such as tearing, wrinkling or spring back are frequently observed. It is shown that the new procedure allows for an efficient process optimization, leading to reduced sheet tearing.
  • Using Conditional Linearity in Nonlinear Regression Analysis for Short-Term Prediction of High Water Levels

    Authors: Alessandro Di Bucchianico (Eindhoven University of Technology), Jan-Rolf Hendriks (Dutch Ministry of Transport, Public Works and Water Management), Krijn Saman (Dutch Ministry of Transport, Public Works and Water Management)
    Primary area of focus / application: Modelling
    Keywords: short-term prediction, water level, nonlinear regression, conditional linearity, singular gradient, variable projection algorithm
    Submitted at 19-Apr-2013 09:21 by Alessandro Di Bucchianico
    18-Sep-2013 09:20 Using Conditional Linearity in Nonlinear Regression Analysis for Short-Term Prediction of High Water Levels
    After the 1953 flood in the South-West Netherlands which killed almost 2000 people, the Dutch government initiated the Delta Plan, a large-scale
    protection system consisting of dikes and dams. An important part of the Delta Plan is the Easter Scheldt Storm Surge Barrier, which is a bridge-like
    construction that can be closed when water levels become too high.
    When water levels are above 275 cm a Decision Team is physically present at the barrier to close the barrier at the right moment. In order to keep
    flooding risk at acceptable levels, the barrier automatically closes and takes over full control when there is a water level prediction of 300 cm
    exactly. In order to avoid environmental damage, the barrier is not allowed to close at predicted water levels below 300 cm. It is thus important for
    the Decision Team to avoid unnecessary closures of the barrier as well avoid that the barrier takes over full control. The Decision Team has access to several long-term water level predictions. As an addition to these existing predictions, we are developing short-time prediction models. Part of these short-time prediction models is a nonlinear regression model with a quadratic polynomial to model the trend and various sine waves to model oscillations. Straightforward use of nonlinear regression to water levels measured every 10 seconds during the past 30 minutes surprisingly failed because of singular gradients. Appropriate scaling of the variables involved had a clear effect but was not sufficient to overcome the numerical problems. We will show how using the fact that the model has conditionally linear parameters solves all numerical problems. Implementations of the so-called variable projection algorithm of Golub and Pereyra in R and Matlab will be presented in order to illustrate how to use conditional linearity in practice.