Abstract
•Improved architectures for two-channel and Siamese networks are proposed.•Proposed networks employ dense convolutional layers following advances in deep CNNs.•Results on both metric and descriptor learning tasks are presented.•Results on two benchmark datasets demonstrate the effciency of the networks.•Descriptors from the Siamese network can be computed effciently using a GPU.
Descriptor and metric learning using deep convolutional neural networks (CNNs) have drawn attention of researchers in the domain of computer vision due to their remarkable performance over traditional methods. Different networks like two-channel, Siamese and triplet, etc., have been proposed recently with the aim to learn a metric or a low dimensional embedding from image patches and have outperformed traditional local descriptors like scale-invariant feature transform (SIFT). Plain CNNs resulting from stacking of several convolutional layers are employed in recent works. In this article, we have followed a recent approach called Deep Compare for metric and descriptor learning and have proposed improved architecture for two-channel and Siamese networks. Our proposed modification is inspired from the novel dense convolutional neural network known as DenseNet architecture. The proposed two-channel and Siamese networks employ dense convolutional layers which reuse feature maps from preceding layers. Our networks, trained with pairs of patches, outperform Deep Compare networks by a significant margin justifying the proposed architecture and obtain results comparable to triplet networks on UBC benchmark dataset. Moreover, we have obtained promising results on patch verification, image matching and patch retrieval tasks on large scale HPatches benchmark dataset using the descriptors from our Siamese network.