Minutes of a Meeting of the Group held on Tuesday 18 September 2001, beginning at 14.00 hours, at The House of Engineers, Oslo.
- Andrea Ahlemeyer-Stubbe (Germany) (Chair)
- Peter Dixon (United Kingdom) (Minutes)
- Espen Haugen (Norway)
- Ragnar Furunes (Norway)
- Paolo Giudici (Italy) (co-Chair)
- Ilkka Karanta (Finland)
- Torulf Mollestad (Norway)
- Magnus Pettersson (Sweden)
In introductory remarks Andrea announced that, apart from ENBIS 2002 Conference, there are two up-coming Data Mining conferences in 2002 (ICDM, Leipzig, June 2002, and Data Mining 2002, Bologna, September 2002). She indicated that priorities for the Group should be to prepare materials for the ENBIS Conference in Rimini, September 2002, taking place just before Data Mining 2002 in Bologna, and to consider ways of expanding the Group. Paolo supported these points and said how important it was for statisticians to begin to make an impact in Data Mining, thus far dominated by computer scientists and marketing specialists. It was suggested that information on these two and other DM conferences could be displayed on ENBIS/DM webpages. Members agreed that the Group should try to help ENBIS reach its goals in this way, and by preparing lists of literature, events, and just generally popularising Data Mining.
Harald asked for clarification in the purpose and role of ENBIS. Andrea and Paolo emphasised the networking aspect, and highlighted the importance of interfacing between research and application. In Data Mining, it was essential to evaluate existing methods and to present new methods. Whilst it was important to persuade company-based data miners to present their ideas and case studies, sometimes it was difficult for these workers to gain company approval to release sensitive information. The Group could consider ways to overcome this. Indeed, the Sept 2001 ENBIS Conference was organised mainly by academics, and the Group wondered if an organising committee of industrial statisticians might have a different approach. There is a need to educate company specialists about the need and value of conferences, and a carefully-planned conference structure might help this.
Towards these points, it was suggested that open discussion sessions at the September 2002 Conference might be useful. Ilkka thought that it was important to have sessions for practical people and that different levels could be aimed at, (i) case studies, (ii) full research papers, (iii) posters. A discussion occurred on the process by which papers should/could be selected. It was essential to support the development of ENBIS by the high quality of conference papers and other input. Torulf thought that it was right to inspire creativity in DM professionals and saw a two-way process between research and 'doing it' as the way forward.
It was agreed that the Data Mining programme for the September 2002 conference should have two ingredients, (i) different activities, eg. case studies, research papers, discussion groups, maybe even tutorials and (ii) themes, eg BPP, CRM, CLV. Torulf hoped to see presentations/papers indicating which methods were best or most appropriate for different types of problem; there's much conjecture over this, and statisticians have an important role to play in this aspect. Andrea thought that another key area was 'how to connect data', and maybe a speaker on this could be found. Torulf put it even more strongly. He said that statisticians have a responsibility to make an impact on DM. What can we add to the general debate that is of value?
The meeting closed at 15.00 hours with the resolution to act on the above points.
The following members attended the working group at the conference:
- Andrea Ahlemeyer-Stubbe
- Michael Föckel
- Tony Greenfield
- Paolo Giudici
- Peter Hackl
- Marion Hofmans
- Susan Jensen
- Chris Klaassen
- Jürgen Schröder
- F. Torre
- Marion van den Bol
- Wieringa, J.E.
- Volker Tresp
Discussion has been quite lively, and has focused, after a brief self-presentation of the participants, on two main themes:
- Identification of the domain of interest: what is data mining, why should it be of interest of statisticians, what can ENBIS do for data mining.
- How the activity of the working group should develop in the near future.
Concerning a) we first came up with a shared view that data mining is not only the statistical analysis of large databases, using "old statistical techniques", but also a new challenging field, which involves:
- sampling appropriately the available massive data
- learning the data generating mechanism underlying the data at hand
- being able to deliver results of statistical data analysis in ways that are efficient and communicable to practitioners.
- working at the interplay between statistical modeling and computationally intensive methods.
Statisticians are not the only community that can do data mining; IT scientists deal with it too; we claim that a sound statistical exploratory, modeling and inferential analysis is necessary for a data mining practice which is not only "efficient" but also "sound", and potentially directable to very specific problems, such as those occurring in customer relationship management. We believe that the role of ENBIS, and of the working group on data mining in particular, is to expose researchers and students in the field of statistics to data mining problems and tools, so that natural skills of statisticians, such as the ability to model real observational data, can be applied to data mining as well.
To clarify better the content of our activity we classified methods and applications that occur in the area. Concerning methods we foresee a three-stage process, with data preparation and data warehousing standing first, then the actual data mining. Data mining tools can be classified in three broad areas: association rules, classification problems, predictive problems. In the first area would fit methodologies such as descriptive multivariate measures of association, loglinear models, graphical models. In the second area regression methods, classification trees, neural networks and cluster analysis seem the most obvious choices. Finally, probabilistic expert systems (Bayesian networks), regression methods and neural networks seem to fit in the third class. Concerning applications, we have decided to confine ourselves to applications that relate to business and industry, letting alone others, such as epidemiological or genetic ones. We identified some "buzzwords" that identify areas we believe important at the moment, and, therefore, requiring a good deal of attention in order to reach out industrial contacts: Customer relationship management (CRM: segmentation and profiling), market basket analysis, web mining (e.g. profiling visitors and optimising sites), churn analysis, fraud detection, credit scoring and risk, financial risk management.
Concerning the organisation of the activities of the working group we suggest, for the near future:
- a mailing list to the interested ENBIS members, possibly open to the outside world.
- a web space for the group on the ENBIS web site. There we shall put:
- DATA: data repository and case studies in the area;
- KNOWLEDGE POOL: links to working group members' web pages, and/or their research interests, including publications; links to important statistical and IT literature; List of journals ENBIS members can refer to for sending data mining papers; links to European groups working in data mining, in the various subject-.matter fields.
We believe to be rather important for members of the working group to meet also physically; for that purposes we have decided to utilise two coming conferences on data mining, as good opportunities for people in the data mining working group to meet. The first of these conferences will be held in July 01 near Munich (Germany) and the second one in October 01 in Pavia (Italy). Andrea Ahlemeyer and Paolo Giudici participate, respectively, to the first and the second of these conferences as main organisers.
To summarise, we believe statisticians can be very good data miners, but this has to be shown "on the field", in hands-on collaborative experiences with business and industry. The role of ENBIS will be to stimulate this interaction to happen and to create a regulatory framework for this purpose.