Data Mining for Automatic Linguistic Description of Data - Textual Weather Prediction as a Classification Problem

In this paper we present the results and performance of five different classifiers applied to the task of automatically generating textual weather forecasts from raw meteorological data. The type of forecasts this methodology can be applied to are template-based ones, which can be transformed into an intermediate language that can directly mapped to classes (or values of variables). Experimental validation and tests of statistical significance were conducted using nine datasets from three real meteorological publicly accessible websites, showing that RandomForest and Ibk are statistically the best classifiers for this task in terms of F-Score, with RandomForest providing slightly better results

keywords: Linguistic descriptions of data, Natural language generation, Weather forecasting, Classification