Ranking unstructured documents in IR (A comparative study of vector space model and latent semantic indexing model)
Date
2012
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science and Information Technology
Abstract
For thousands of years people have realized the importance of archiving and finding
information. With the advent of computers, it became possible to store large amounts of
information; and finding useful information from such collections became a necessity. The
field of Information Retrieval (IR) was born in the 1950s out of this necessity. Over the last
fifty years, the field has matured considerably. Several IR systems are used on an everyday
basis by a wide variety of users. The goal of information retrieval (IR) is to provide users
with those documents that will satisfy their information need.
Various Models of Information retrieved have been implemented like Boolean
Model, Vector Space Model, Probabilistic Model and so on, among these models Vector
Space Model (VSM) and Latent Semantic Indexing Model (LSI) are also promising models
being used till date. The main concern of the study is to rank the documents and find out
whether LSI Model overcomes the problems of VSM when the problems are attached with
synonyms and polysemys while ranking documents.
The implemented features of these models like how to represent documents
and query as vectors in R
|v|
, term-document matrix, term-weighting, cosine similarity, SVD
decomposition, dimensionality reduction and its effect in results of LSI have been presented.
Precision and recall have been implemented to know the effectiveness of the system.
Conclusions have been drawn and future recommendation has been provided for better
improvement.
Description
Keywords
Unstructured documents, Indexing model