Automatic Text Summarization
Autor: Monith Sourya • October 11, 2018 • Essay • 1,064 Words (5 Pages) • 705 Views
Automatic Text Summarization
AI assignment - 1
R Monith Sourya | 2016A7PS0006H
Nikhil Sreekumar Nair | 2016A7PS0049H
Sanjay Devprasad | 2016A7PS0033H
Analysis Document
Necessity and Motivation
With the onset of the internet revolution, there has been an explosion in the amount of information available for study. And in order to conduct studies aimed at appropriate and accurate conclusions, we require the latest information there is available. By building an automatic text summarizer, we can intelligently and accurately summarize documents to understand them better and shorten the time it takes us in achieving the same.
The main idea of summarization is to find a subset of data which contains the "information" of the entire set. Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences.
Existing systems and approaches
Typical text summarization makes use of either an extractive or an abstractive model.
Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text and concatenate them to form a summary. Extractive systems are quite robust since they use existing natural language phrases that are taken straight from the input. However, we must realize that they lack in flexibility since they cannot use novel words or connectors. They also cannot paraphrase like we, as humans, are capable of achieving.
In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express. Such a summary, while intended to keep the original content’s intent, can use words that were not in the original input. The primary goal of an abstractive model is to create more fluent and natural summaries. However, such a model constitutes a harder problem as we now require the model to generate coherent phrases and connectors.
Abstraction modelled summarization is still a very heavily researched field, which is why most established automatic text summarizers make use of extraction models. Clustering of phrases and sentences in extraction-based summarization make use of either a graph based, feature based, topic based, or grammar-based systems (explained a little more in detail in project scope and feasibility).
Project Scope and Feasibility
We have elected to apply a combination of both the approaches in our version of this project. Considering that the prospects of the same are heavily researched topics in the present, we were unable to find literature that had established methods to achieve this. We have selected a few methods that we hope will work toward building a hybrid of the two approaches. All of these ideas will be implemented and tested for results in due course of time before the completion of the assignment, and the optimal combination will be selected.
...