Conflation Methods and Spelling Mistakes - A Sensitivity Analysis in Information Retrieval

Abstract

In some information retrieval scenarios, for example internal help desk systems, texts are entered into the document collection without proofreading. This can result in a relatively high number of spelling mistakes, which can skew the order of the documents retrieved for a query or even prevent the retrieval of relevant documents. We focus on addressing this problem at the conflation stage of the retrieval process and evaluate whether conflation based on n-grams, which is said to be insensitive to misspellings, leads to better retrieval quality than commonly used stemming algorithms. We do this by performing tests on artificially corrupted test collections and examine which characteristics of the queries and the relevant documents influence the relative retrieval quality achieved using the different conflation methods.

Topics

2 Figures and Tables

Download Full PDF Version (Non-Commercial Use)