Sub-story detection in Twitter with hierarchical Dirichlet processes

PK Srijith Mark Hepple Kalina Bontcheva Daniel Preotiuc-Pietro
Abstract Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time–a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection–as sub-story detection. This paper proposes hierarchical Dirichlet processes ( ...