Full metadata
Title
Faceted search and browsing of Indonesian text collection using shallow parsing techniques
Description
Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The search engine data was extracted from various Indonesian web sources. Metadata information like person, organization, location, and sentiment analytic keyword entities should be tagged in each document to provide facet search capability. A shallow parsing technique like named entity recognizer is used for this purpose. There are more than 1500 entities that have been tagged in this process. These documents have been successfully converted into XML format and are indexed with "Apache Solr". It is an open source enterprise search engine with full text search and faceted search capabilities. The entities will be helpful for users to specify conditions and search faster through the large collection of documents. The user is assured results by clicking on a metadata condition. Since the sentiment analytic keywords are tagged with positive and negative values, social scientists can use these results to check for overlapping or conflicting organizations and ideologies. In addition, this tool is the first of its kind for the Indonesian language. The results are fetched much faster and with better accuracy.
Date Created
2010
Contributors
- Sanaka, Srinivasa Raviteja (Author)
- Davulcu, Hasan (Thesis advisor)
- Sen, Arunabha (Committee member)
- Taylor, Thomas (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
vii, 42 p. : col. ill
Language
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.8705
Statement of Responsibility
by Srinivasa Raviteja Sanaka
Description Source
Viewed on Sept. 20, 2012
Level of coding
full
Note
Partial requirement for: M.S., Arizona State University, 2010
Note type
thesis
Includes bibliographical references (p. 40-42)
Note type
bibliography
Field of study: Computer science
System Created
- 2011-08-12 02:49:42
System Modified
- 2021-08-30 01:56:35
- 2 years 8 months ago
Additional Formats