Using the Outlier Detection Task to Evaluate Distributional Semantic Models

TítuloUsing the Outlier Detection Task to Evaluate Distributional Semantic Models
AutoresPablo Gamallo
TipoArtículo de revista
Fonte Machine Learning and Knowledge Extraction, MDPI, Vol. 1, No. 1, pp. 211-223 , 2018.
ISSN2504-4990
DOI10.3390/make1010013
AbstractIn this article, we define the outlier detection task and use it to compare neural-based word embeddings with transparent count-based distributional representations. Using the English Wikipedia as a text source to train the models, we observed that embeddings outperform count-based representations when their contexts are made up of bag-of-words. However, there are no sharp differences between the two models if the word contexts are defined as syntactic dependencies. In general, syntax-based models tend to perform better than those based on bag-of-words for this specific task. Similar experiments were carried out for Portuguese with similar results. The test datasets we have created for the outlier detection task in English and Portuguese are freely available.
Palabras chavedistributional semantics, dependency analysis, evaluation, word similarity