Psychological Features for Automatic Text Summarization

Automatically summarizing a document requires conveying the important points of a large document in only a few sentences. Extractive strategies for summarization are based on selecting the most important sentences from the input document(s). We claim here that standard features for estimating sentence importance can be effectively combined with innovative features that encode psychological aspects of communication. We employ Quantitative Text analysis tools for estimating psychological features and we inject them into state-of-the-art extractive summarizers. Our experiments demonstrate that this novel set of features is a good guidance for selecting salient sentences. Our empirical study concludes that psychological features are best suited for hard summarization cases. This motivated us to formally define and study the problem of predicting the difficulty of summarization. We propose a number of predictors to model the difficulty of every summarization problem and we evaluate several learning methods to perform this prediction task.

keywords: Automatic Text Summarization, Psychology of Natural Language use, Linguistic Inquiry Word Count, Predicting Summarization Difficulty