Text Summarization Techniques. A Brief Overview of Different… | by Sreenath Acharath | Nov, 2020
A Brief Overview of Different Extractive Approaches
Automatic text summarization, or just text summarization, is the process of creating a short and coherent version of a longer document. Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).
We (humans) are generally good at this type of task as it involves first understanding the meaning of the source document and then distilling the meaning and capturing salient details in the new description. As such, the goal of automatically creating summaries of text is to have the resulting summaries as good as those written by humans.
It is not enough to just generate words and phrases that capture the gist of the source document. The summary should be accurate and should read fluently as a new standalone document. Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning.
Given errors resulting from speech recognition and the fact that spoken language is often less formal than written language, the most widely used method for single document text summarization, sentence extraction, cannot be directly applied to speech summarization. However, if systems exploit the additional information that can be derived from the speech signal and dialogue structure, extractive methods can be extended for spoken language and augmented by new methods that focus on extracting particular kinds of information and reformulating it appropriately. 
There are two types of summarization techniques: extractive and abstractive. Extractive methods function by identifying the important sentences or excerpts from the text and reproducing them verbatim as part of the summary. No new text is generated; only the existing text is used in the summarization process. This differs from abstractive methods, which employ more powerful natural language processing techniques to interpret the text and generate a new summary text.
Basically, there are three fairly independent tasks which all summarizes perform, as listed below. In this article, we will look at different abstractive topic presentation approaches.
- Construct an intermediate representation of the input text which expresses the main aspects of the text.
- Score the sentences based on the representation.
- Select a summary comprising of several sentences.
1. Topic Words
This method aims to identify words that describe the topic of the input document. There are two ways to compute the importance of a sentence: as
a function of the number of topic signatures it contains, or as the proportion of the topic signatures in the sentence. Both sentence scoring functions relate to the same topic representation; however, they might assign different scores to sentences. The first method may assign higher scores to longer sentences because they have more words. The second approach measures the density of the topic words.
2. Frequency-driven Approaches
When assigning weights of words in topic representations, we can think of binary (0 or 1) or real-value (continuous) weights and decide which words are more correlated to the topic. The two most common techniques in this category are word probability and TFIDF (Term Frequency Inverse Document Frequency).
2.1 Word Probability: This is the simplest method to use word frequency as an indicator of importance. The probability of a word w is determined as the number of occurrences of the word, f (w), divided by the number of all words in the input, N (which can be a single document or multiple documents):
P(w) = f (w)/N
It then picks the best scoring sentence that contains the highest probability word. This step ensures that the highest probability word, which represents the topic of the document at that point, is included in the summary. These selection steps will repeat until the desired length summary is reached.
2.2 TFIDF: This weighting technique assesses the importance of words and
identifies very common words (that should be omitted from consideration)
in the document(s) by giving low weights to words appearing in most documents. The weight of each word w in document d is computed as follows:
q(w) = fd (w)* log(|D|/fD(w))
where fd (w) is term frequency of word w in document d, fD(w) is the number of documents that contain word w and |D| is the number of documents in the collection D.
3. Latent Semantic Analysis
This method first builds a term-sentence matrix (n by m matrix), where each row corresponds to a word from the input (n words) and each column corresponds to a sentence (m sentences). Each entry aij of the matrix is the weight of the word i in sentence j. The weights of the words are computed by TFIDF technique, and if a sentence does not have a word, the weight of that word in the sentence is zero. Then singular value decomposition (SVD) is used on the matrix and transforms the matrix A into three matrices U, S & V.
- Matrix U (n ×m) represents a term-topic matrix having weights of words.
- Matrix S is a diagonal matrix (m ×m) where each row i corresponds to the weight of a topic i.
- Matrix V is the topic sentence matrix.
- The matrix D = SV describes how much a sentence represents a topic; thus, di j shows the weight of the topic i in sentence j.
The initial method was to choose one sentence per each topic; therefore, based on the length of summary in terms of sentences, they retained the number of topics. This strategy has a drawback because a topic may need more than one sentence to convey its information. It was later enhanced to leverage the weight of each topic to decide the relative size of the summary that should cover the topic, which gives the flexibility of having a variable number of sentences.
4. Bayesian Topic Models
Two major limitations of the above-mentioned approaches are: 1) They consider the sentences as independent of each other, so topics embedded in the documents are disregarded. 2) Sentence scores computed by most existing approaches typically do not have unambiguous probabilistic interpretations, and many of the sentence scores are calculated using heuristics.
Bayesian topic models are probabilistic models that uncover and represent the topics of documents. They are quite powerful and appealing because they represent the information (i.e. topics) that are lost in other approaches. Their advantage in describing and representing topics in detail enables the development of summarized systems which can determine the similarities and differences between documents to be used in summarization.
This article is heavily inspired by and based on the paper “Text Summarization Techniques: A Brief Survey”  which details different abstractive text summarization techniques. Abstractive summarization is a totally different beast, as it is expected to write a summary similar to a human, in its own words, most of the times. Abstractive methods are still at its infancy when compared to abstractive methods & there is no well-defined mechanism to measure the performance. However, it has a huge potential, and we will certainly see several breakthroughs in the coming days.
 McKeown, Hirschberg, Galley and Maskey, 2016. From Text Summarization to Speech Summarization
 Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. 2017. Text Summarization Techniques: A Brief Survey