Description
With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization.
Reuse Permissions
  • Downloads
    pdf (1 MB)

    Details

    Title
    • Automatic text summarization using importance of sentences for email corpus
    Contributors
    Date Created
    2015
    Resource Type
  • Text
  • Collections this item is in
    Note
    • Partial requirement for: M.S., Arizona State University, 2015
      Note type
      thesis
    • Includes bibliographical references (pages 31-32)
      Note type
      bibliography
    • Field of study: Computer science

    Citation and reuse

    Statement of Responsibility

    by Sravan Nadella

    Machine-readable links