Research projects




May 2006-: head of language and translation technology group (Hogeschool Gent, Departement Vertaalkunde).


2005-April 2006: COREA, "Coreference Resolution for Extracting Answers" (Antwerp University).

Coreference resolution is a key ingredient for the automatic interpretation of text. It has been studied mainly from a linguistic perspective, with an emphasis on establishing potential antededents for pronouns. Practical applications, such as Information Extraction (IE), summarization and Question Answering (QA), require accurate identification of coreference relations between noun phrases in general. Computational systems for assigning such relations automatically, require the availability of a sufficient amount of annotated data for training and testing. For Dutch, annotated data is scarce and coreference resolution systems are lacking.
In this COREA project, a two-year project which started in July 2005, we aim to develop a robust system for assigning such relations automatically, and we will investigate the effect of making coreference relations explicit on the accuracy of systems for for IE and QA. We will annotate a limited amount of application-specific corpus material, which is required for the evaluation of the coreference resolution system in the context of IE and QA.




PhD Thesis Research, "Optimization Issues in Machine Learning of Coreference Resolution" (Antwerp University).

The thesis presents a machine learning approach to the resolution of coreferential relations between nominal constituents in Dutch. It is the first automatic resolution approach proposed for this language. The corpus-based strategy was enabled by the annotation of a substantial corpus (ca. 12,500 noun phrases) of Dutch news magazine text with coreferential links for pronominal, proper noun and common noun coreferences. Based on the hypothesis that different types of information sources contribute to a correct resolution of different types of coreferential links, we propose a modular approach in which a separate module is trained per NP type. Lacking comparative results for Dutch, we also perform all experiments for the English MUC-6 and MUC-7 data sets, which are widely used for evaluation.




2000-2004: PROSIT, "Prosody from Information in Text" (Antwerp University).

PROSIT is a transnational research programma carried out in Tilburg, the Netherlands (Induction of Linguistic Knowledge, ILK) and in Antwerp, Belgium (Center for Dutch Language and Speech, CNTS). It is funded by the Flemish-Dutch Committee of the National Foundations for Research in the Netherlands and Belgium, under the official project title Automatic text analysis and machine learning for prosody.
The project aims at investigating the generation of prosodic structure for a text-to-speech synthesis system. Being able to generate accurate prosody is one of the most crucial developments needed to get speech synthesis at a level of pleasant fluency. Within the project, prosody generation is considered a natural language processing problem rather than a speech technology problem: it is defined as the prediction of prosodic markers (accents and breaks) by means of automatic analyses of written texts, and is less concerned about how these markers need to be interpreted in terms of appropriate melodic, durational and other prosodic features when the text is converted into speech.
The central question is whether prosody generation can be accurately performed by (a) robust automatic analysis of texts using techniques from information retrieval and natural language processing, and (b) advanced machine learning systems and meta-learning systems such as combiners and boosting ensembles. The target language is Dutch.




november 1998-2000 : LINGUADUCT, "An Inductive Machine Learning Environment for Corpus Annotation" (Antwerp University)

.