Learning Similarity Functions for Topic Detection in Online Reputation Monitoring

Abstract

Reputation management experts have to monitor–among others–Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality…). Solving this reputation monitoring problem automatically as a topic detection task is both essential–manual processing of data is either costly or prohibitive–and challenging–topics of interest for reputation monitoring are usually fine-grained and suffer from data sparsity. We focus on a solution for the problem that (i) learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; (ii) applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).

Publication
Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval