Abstract
Software developers are often using instant messaging platforms to communicate with each other and other stakeholders. Among these platforms, Gitter has emerged as a popular choice and the messages it contains can reveal important information to researchers studying open source software systems. Uncovering what developers are communicating about through Gitter is an essential first step towards successfully understanding and leveraging this information. In this paper, we first describe the largest manually labeled and curated dataset of Gitter developer messages, named
GitterCom
, obtained by manually analyzing and labeling 10,000 Gitter messages in 10 software projects. We then present a qualitative study to understand the extent to which the categories identified in previous work by Lin et al. (
2016
) found on Slack through surveys are applicable to developer messages exchanged on Gitter. Further, in an effort to automate the labeling process, we investigate the accuracy of 9 traditional machine learning and deep learning algorithms in predicting the intent of Gitter messages. We found that Decision Trees and Random Forest performed the best, achieving an accuracy of 88%, which is very promising for this multi-class classification task. Finally, we discuss the potential directions for future research enabled by labeled Gitter datasets such as
GitterCom
.