ENBIS-20 Pre-Conference Event: Joint ECAS-ENBIS 1-Day Summer Course - POSTPONED

27 September 2020; 09:30 – 18:30

This 1-day course is a joint initiative from ENBIS and ECAS (ecas.fenstats.eu), which provides courses since 1987 in special areas of statistics both for researchers and teachers for universities and professionals in the industry.

The 1-day course is organised under the umbrella of the ENBIS-20 conference in Valencia. You can find more information about the conference here.

Textmining: from basics to deep learning tools

Sunday, 27th September 2020

 

Instructors

Adrien Guille (Laboratoire ERIC, Université de Lyon, France)
Jairo Cugliari (Laboratoire ERIC, Université de Lyon, France)

Overview

Textual data are pervasive and can be leveraged to help solving a wide range of problems. This new source of information coupled with recent advances in text mining have incontestably impacted the industry and academic research. While classical approaches yield reasonable performances on diverse text mining tasks, they make restrictive assumptions incompatible with some properties of natural language. In the last decade, these assumptions have been partly relaxed thanks to important breakthroughs in representation learning and deep learning, enhancing the performance for several tasks.

The course is a first introduction to text mining aimed at a broad audience of practitioners. We’ll present the classical way of preprocessing, encoding and leveraging text data. Then, we’ll introduce recent techniques to learn more meaningful text representations and ways to deal with them using deep neural networks. We’ll stress out the importance of using modern approaches to represent the text through two case studies with industrial applications.

Some experience in programming with R/Python is a plus. No prior knowledge of any deep learning framework is required.

Outline

The provisional table of content of the course is:

Classical approach

1 - Basic text representation: bag-of-words, n-grams, tf-idf weighting
2 - Classical methods for supervised text classification: linear models, trees and random forests

Deep learning approach

3 - Representation learning for text mining: distributional hypothesis, word embedding (word2vec, GloVe), some geometrical properties
4 - Case studies. Detailed presentation of two real problems:
   a. Electricity Forecast with recurrent neural networks
   b. Sentiment Analysis with convolutional neural networks

Labs

1 - Loading and representing text with R/Python
2 - Abusive language detection: classifying toxic Wikipedia comments
3 - Encoding text data with word2vec and GloVe
4 - Designing deep neural networks on top of word embeddings with Keras

Short Bio

Jairo Cugliari is Associate Professor on Statistics at University of Lyon. His research is oriented to industrial data science problems involving complex data such as functions, texts, multicriteria or time series. http://eric.univ-lyon2.fr/jcugliari/
Adrien Guille is Associate Professor of Computer Science at the University of Lyon. His research interest lies in developing machine learning methods to deal with textual corpora, with an emphasis on structured corpora (i.e. networks of documents). http://mediamining.univ-lyon2.fr/people/guille/