Abstract
Crowd counting and Crowd density map estimation face several challenges, including occlusions, non-uniform density, and intra-scene scale and perspective variations. Significant progress has been made in the development of most crowd counting approaches in recent years, especially with the emergence of deep learning and massive crowd datasets. The purpose of this work is to address the problem of crowd density estimatation in both sparse and crowded situations. In this paper, we propose a multi-task attention based crowd counting network (MACC Net), which consists of three contributions: 1) density level classification, which offers the global contextual information for the density estimation network; 2) density map estimation; and 3) segmentation guided attention to filter out the background noise from the foreground features. The proposed MACC Net is evaluated on four popular datasets including ShanghaiTech, UCF-CC-50, UCF-QRNF, and a recently launched dataset HaCrowd. The MACC Net achieves the state of the art in estimation when applied to HaCrowd and UCF-CC-50, while on the others, it obtains competitive results.