Improving Unstructured Text Summarization Using An Ensemble Approach

Sherif Elfayoumy, Jenny Thoppil

Abstract


Due to the explosive amounts of text data being created and organizations increased desire to leverage their data corpora, especially with the availability of Big Data platforms, there is not usually enough time to read and understand each document and make decisions based on document contents. Hence, there is a great demand for summarizing text documents to provide a precise substitute for the original documents. In this article we present an ensemble approach that combines several of the well-researched text summarization techniques to produce better document summaries than individual techniques.
An experiment that uses the ensemble approach was designed and results were evaluated. For the purpose of the experiment the ensemble combined the cosine similarity, enhanced latent semantic analysis using SVD, and maximal marginal relevance measure algorithms. The ensemble was applied on two datasets and the results were found to be promising when compared to the manual summaries developed by human evaluators.


Keywords


text summarization; unstructured data; text mining; unstructured data analytics

Full Text:

PDF

Refbacks

  • There are currently no refbacks.