May
2006-: head of language and translation technology group
(Hogeschool Gent,
Departement Vertaalkunde).
.
| |||
|
Coreference resolution is a key ingredient for the automatic
interpretation of text. It has been studied mainly from a
linguistic perspective, with an emphasis on establishing potential
antededents for pronouns. Practical applications, such as
Information Extraction (IE), summarization and Question Answering
(QA), require accurate identification of coreference relations
between noun phrases in general. Computational systems for
assigning such relations automatically, require the availability
of a sufficient amount of annotated data for training and
testing. For Dutch, annotated data is scarce and coreference
resolution systems are lacking.
| |||
|
The thesis presents a machine learning approach to the resolution of coreferential relations between nominal constituents in Dutch. It is the first automatic resolution approach proposed for this language. The corpus-based strategy was enabled by the annotation of a substantial corpus (ca. 12,500 noun phrases) of Dutch news magazine text with coreferential links for pronominal, proper noun and common noun coreferences. Based on the hypothesis that different types of information sources contribute to a correct resolution of different types of coreferential links, we propose a modular approach in which a separate module is trained per NP type. Lacking comparative results for Dutch, we also perform all experiments for the English MUC-6 and MUC-7 data sets, which are widely used for evaluation. | |||
|
PROSIT is a transnational research programma carried out in
Tilburg, the Netherlands (Induction of Linguistic Knowledge, ILK)
and in Antwerp, Belgium (Center for Dutch Language and Speech,
CNTS). It is funded by the Flemish-Dutch Committee of the National
Foundations for Research in the Netherlands and Belgium, under the
official project title Automatic text analysis and machine
learning for prosody.
|