Abstract
In recent years, there has been increased interest in real-world event identification using data collected from social media, where theWeb enables the general public to post real-time reactions to terrestrial events-thereby acting as social sensors of terrestrial activity. Automatically extracting and categorizing activity from streamed data is a non-trivial task. To address this task, we present a novel event detection framework which comprises five main components: data collection, pre-processing, classification, online clustering and summarization. The integration between classification and clustering allows events to be detected-including "disruptive" events-incidents that threaten social safety and security, or could disrupt the social order. We evaluate our framework on a large-scale, real-world dataset from Twitter. We also compare our results to other leading approaches using Flickr MediaEval Event Detection Benchmark.