EXTRA-LEX: Automatic extraction of bilingual Galician-Spanish lexicons and update of lexicographical resources of translation automatic engines

This project intends to establish a bilingual lexical automatic extractor for Galician and Spanish languages, and use the lexicons extracted in improving and updating the computer dictionaries exploited by machine translation systems. The method of extraction is based on exploitation techniques of non-parallel corpus of comparable theme.

Special attention will be paid to automatic learning of equivalent of translating expressions multiword, little present in linguistic resources built manually and essentially for improving the quality of the translation motors. In particular, it works on improving and updating of lexicographical resources of translation-engine open source Open Trad.

Objectives

  • Developing a corpus by crawling techniques on the web.
  • Identifier of the language.
  • Identifier and extractor of multiword terms.
  • Automatic extraction of bilingual lexicons from non-parallel corpus.
  • Use of these lexicons in the maintenance, improvement and constant updating of the dictionaries of bidirectional machine translation system Open Trad.
Link to the Project Website