WSIE – Web Scale Information Extraction
Co-located with ECML/PKDD2013, date; 27th of September, 2013

This tutorial analyses open challenges for Web-scale Information Extraction (IE) and introduces the usage of Linked Data as a ground-breaking solution for the field.

Training data is an essential resource to machine learning. However, it is expensive to create. The limited availability of such data has so far prevented the study of the generalised use of large-scale resources to port to specific user information needs on Information Extraction tasks.

For the last few years Linked Data has grown to a gigantic knowledge base, which, as of 2013, comprised 31 billion triples in 295 data sets http://lod-cloud.net/state. Such resources can become invaluable training data for Web-scale Information Extraction and natural language tasks because they are: (i) very large scale, (ii) constantly growing,
(iii) covering multiple domains and (iv) being used to annotate a growing number of pages that can be exploited for training.

This tutorial will show how to exploit Linked Data for IE and will explore Information Extraction techniques able to scale at web level and adapt to user information need.
We will particularly focus on the tasks of Wrapper Induction and Table Interpretation. As an example of linked data driven IE, we will present and discuss a multi strategy learning method and framework designed to train Web-scale IE using Linked Data, while coping with noise in the training data. The approach uses multiple strategies: (i) it wraps very regular web sites generated by backing databases; (ii) extracts from regular structures such as tables and lists and (iii) learns lexical-syntactic extraction patterns for information extraction from natural language.

Tutors:Dr Anna Lisa Gentile, Dr Ziqi Zhang

Visual Analytic’s with Social Media for Crisis Management
At Integrative and Analytical Approaches to Crisis Response and
Emergency Management Information Systems (ISCRAM 2013), May 12-15, 2013

The tutorial will first of all examine the various social media sources available and how we can get access to them. Freely available open source tools and libraries for data visualisation will be presented alongside with techniques to analyse and extract information from social media. A hands-on session will involve participants in creating visual analytics solutions for a pre-defined dataset, followed by a group discussion on possible new design and interactions (more)

Organisers: Vitaveska Lanfranchi, Suvodeep Mazumdar, Andrea Varga