Index Structure For Metadata Extracted From Large  Hypertext Collections

Pathak, Achyut Pd.

Index Structure For Metadata Extracted From Large Hypertext Collections

Files

THESIS.pdf (317.01 KB)

Date

2008

Authors

Pathak, Achyut Pd.

Publisher

Department of Computer Science

Abstract

Growing amount of hypertext data can be found in various contexts like weblogs and online journals, intranet webs, the World Wide Web (WWW), online communities, intraorganizational wikis and other collaborative content management platforms. In such collections, the combination of content and hyperlink structures reflect several interesting information about various phenomena like existence of cyber communities, the documents similar to a given document, the popularity and importance of documents, the probability of reaching a document from any other document by following a sequence of hyperlinks etc. These can all be determined by analyzing a hypertext web. So, different kinds of analysis can be done on hypertext collections. Doing analysis requires locating and finding some information in hypertext collection. To locate information in hypertext database requires the use of an index. Since hypertext database is large in size, we need an efficient index structure to locate information in hypertext collection. Keys are used to construct the index and to search information in the index. Urls of web pages are used as keys to construct the index for hypertext collections. Since Urls of pages are variable in length, index that supports variable length keys is needed. To achieve these, a multilevel index supporting variable length key has been constructed as an index for hypertext collections.