Mapping between WordNet Domains and Wikipedia categories

The distribution of WordNet Domains also includes the mapping between WordNet-Domains and WordNet topics, and the emergent Wikipedia categories.

WordNet-Domains and WordNet topics. Starting from version 3.0, Princeton WordNet has associated topic information with a subset of its synsets. This topic labeling is achieved through pointers from a source synset to a target synset representing the topic, and it was developed independently from WordNet-Domains.

WordNet-Domains - Multilingual Wikipedia categories. This mapping leads to a coarse alignment between WordNet and Wikipedia, useful for producing domain-specific and multilingual corpora. Multilinguality is achieved through the cross-language links between Wikipedia categories. Research in word-sense disambiguation has shown that within a specific domain, relevant words have restricted senses. The multilingual, and comparable, domain-specific corpora we produce have the potential to enhance research in word-sense disambiguation and terminology extraction in different languages, which could enhance the performance of various NLP tasks.

Example of WordNet Domains Mappings to Wikipedia categories.

Through this mapping, we could associate to each WordNet domain a domain-specific corpus, gathered from the articles subsumed by the corresponding Wikipedia categories, for each of the desired languages. Upon request, a corpus of domain-labelled Wikipedia articles is available.