Abstract
We propose a novel deep learning approach for effective dense crowd counting by characterizing scattered occlusions, named CSONet. CSONet recognizes the implications of event-induced, scene-embedded, and multitudinous obstacles such as umbrellas and picket signs to achieve an accurate crowd analysis result. CSONet is the first deep learning model for characterizing scattered occlusions of effective dense-mode crowd counting to the best of our knowledge. We have collected and annotated two new scattered occlusion object datasets, which contain crowd images occluded with umbrellas (cso-umbrellas dataset) and picket signs (cso-pickets dataset). We have designed and implemented a new crowd overfit reduction network by adding both spatial pyramid pooling and dilated convolution layers over modified VGG16 for capturing high-level features of extended receptive fields. CSONet was trained on the two new scattered occlusion datasets and the ShanghaiTech A and B datasets. We also have built an algorithm that merges scattered object maps and density heatmaps of visible humans to generate a more accurate crowd density heatmap output. Through extensive evaluations, we demonstrate that the accuracy of CSONet with scattered occlusion images outperforms over the state-of-art existing crowd counting approaches by 30% to 100% in both mean absolute error and mean square error.