Abstract
Content-based Image Retrieval (CBIR) has been studied over decades and starting from conventional local handcrafted methods to CNN-based methods many works have achieved the best performances in retrieval tasks using query expansion, average query expansion, and query fusion techniques. This work presents a novel approach to revisit the large-scale image retrieval benchmarks Oxford building and Paris building using the SIFT and CNN-based approach. In this paper, we have revised two image retrieval methods and combined the approaches for better performance on image retrieval tasks by describing the annotation errors that have not discussed earlier. The new extensive queries were added for each dataset, making it difficult for the retrieval query phase. VGG-16 network used and RootSIFT applied for feature extraction step whereas T-embedding and democratic aggregation applied on the local descriptors. Query expansion which is an extensive technique for retrieval accuracy is used to check the validation of the proposed pipeline, and our framework achieved the state-of-the-art in addressing the retrieval results compared to other CBIR methods.