A new statistical-linguistic method for automatic translation based on non-parallel corpus and compositional semantics

The main objective of this Project is the design and development of a new statistical-linguistic strategy for automatic translation. Current paradigm in automatic statistical translation (SMT) is based on the use of bilingual aligned parallel corpora and in the non-compositional segmentation. In contrast, present proposal is supported by the exploitation of non-parallel bilingual corpora and the use of compositional segmentation based on the distributional semantics.
The main contribution of the project is precisely the application of the distributional compositional semantics to bilingual vector spaces by means of the use and exploitation of bilingual non-parallel corpus, which do not necessary are comparable. Thus, the proposal is novel for the research in automatic translation. The propose model of translation uses a linguistico-statistical approach. First, it uses syntactic transfer rules and, second, it uses vector spaces automatically built using non-parallel corpora to represent the distribution (or contextual meaning) of linguistic expressions.

Objectives

  • To design and develop a new linguistico-statistical strategy for automatic translation based on the exploitation of bilingual corpora and the compositional semantics.
  • To implement a translation system English-Spanish (en-es), limited to clausal sentences whose predicates contain the verbal English locutions known as phrasal verbs, characterized by their great lexical ambiguity.