Abstract
Road segmentation in high spatial resolution satellite images is an important research topic and has numerous applications in traffic monitoring and intelligent transportation systems. With the growing increase in urban population and rapid changes in the urban environment, persistent updates are required in road databases. Therefore, it is required to develop a robust system that automatically analyzes high spatial resolution satellite images and extracts road networks. Such systems can be persistently used for updating road databases and provide meaningful support to intelligent transportation systems. However, road extraction from aerial images is a challenging problem due to cluttered background, inter-class correlation, and occlusions in the scene. To address the above complex problems, we propose an encoder–decoder network, namely, DSMSA-Net, integrated with attention units to cope with road segmentation tasks in high spatial resolution satellite images. The encoder part of the network extracts multi-scale features from different convolutional layers. The decoder part consists of two modules: Scale Attention Unit (SaAU) and Spatial Attention Unit (SpAU). The first module, (SaAU), utilizes feature maps of different residual blocks of the encoder to extract multi-scale information. The second module, SpAU, improves the spatial representation of the region of interest and extracts meaningful contextual information. We use two publicly available challenging benchmark datasets, i.e., DeepGlobe and Massachusetts road dataset to evaluate the performance of the proposed framework. From quantitative and qualitative comparisons, we demonstrate the proposed framework achieves superior performance compared to reference methods.