Abstract
In most of the classical information retrieval models, documents are represented as bag-of words which takes into account the term frequencies (tf) and inverse document frequencies (idf) while they ignore the term proximity. Recently, term proximity among query terms has been observed to be beneficial for improving performance of document retrieval. Several applications of the retrieval have implemented tools to determine term proximity at the query formulation level. They rank documents based on the relative positions of the query terms within the documents. They must store all proximity data in the index, leading to a large index, which slows the search. Recently, many models use term signal representation to represent a query term, the query is transformed from the time domain to the frequency domain using transformation techniques such as wavelet. Discrete Wavelet Transform (DWT) uses multiple resolutions technique by which different frequencies are analyzed with different resolutions. The advantage of the DWT is to consider the spatial information of the query terms within the document rather than using only the count of terms. In this paper, in order to improve ranking score as well as improve the run-time efficiency to resolve the query, and maintain a reasonable space for the index, three different types of spectral analysis based on semantic segmentation are carried out namely: sentence-based segmentation, paragraph-based segmentation and fixed length segmentation; and also different term weighting is performed according to term position.