Doctoral meeting: 'Fuzzy Quantified Protoforms for Data-To-Text Systems: a new model with applications'

venres, 26 febreiro, 2021 -
10:00 - 11:00
Microsoft Teams
Andrea Cascallar Fuentes (CiTIUS predoctoral researcher)

It is well known today that all organizations generate and consume large amounts of data. However, these data have real value insofar as they can be transformed into relevant and useful information, which can be effectively transferred so that it is considered in the decision-making processes by the responsible persons. Although the tools for data analysis are already very common, the tools that allow to communicate these results in a comprehensible way to human decision makers are not so developed. This is where data-to-text (D2T) systems, a discipline that focuses on the automatic generation of texts from various sources of numerical or symbolic data, is presented as an emerging technology of undoubted usefulness. An important task within these systems is the analysis of data made to obtain the basic pieces of information that will be later integrated into the texts. This phase is called data processing or content determination and is one of the objectives of this thesis.  

At the same time, in the fuzzy logic and soft computing fields many approaches were proposed to generate describe data using linguistic terms, for instance, Linguistic Descriptions of Data (LDD) which summarize in a linguistic form one or more numerical variables and their values, using the general notion of protoform. These protoforms can follow several structure types being type-1 and type-2 linguistic descriptions the more common in the literature.  

The objective of this thesis is threefold:

  • On one side, we aim to improve and extend content determination in D2T systems by introducing new techniques based on artificial intelligence for the representation of imprecise knowledge and intelligent search. Specifically, we will propose new models composed by fuzzy protoforms that include information and geographical relationships. We will design scalable algorithms that allow their extraction on several types of data.
  • Our second objective is to measure and compare the impact of the selection of the fuzzy quantification method to assess their empirical behavior when applied for the evaluation of fuzzy quantified sentences.
  • Finally, our last objective is the design of a model which cover the data-to-text pipeline of fuzzy quantified statements to describe temporal series.
Supervisors: Alberto Bugarín and Alejandro Ramos