“WTF: the who to follow service at Twitter”
Date: Wednesday, July 24th, 17:30
Location: plista, Torstraße 33, 10119 Berlin, Deutschland
Right after the presentation we will walk together to the Prater Garden to get some more drinks.
WTF (“Who to Follow”) is Twitter’s user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural overview and shares lessons we learned in building and running the service over the past few years. Particularly noteworthy was our design decision to process the entire Twitter graph in memory on a single server, which significantly reduced architectural complexity and allowed us to develop and deploy the service in only a few months. At the core of our architecture is Cassovary, an open-source in-memory graph processing engine we built from scratch for WTF. Besides powering Twitter’s user recommendations, Cassovary is also used for search, discovery, promoted products, and other services as well. We describe and evaluate a few graph recommendation algorithms implemented in Cassovary, including a novel approach based on a combination of random walks and SALSA. Looking into the future, we revisit the design of our architecture and comment on its limitations, which are presently being addressed in a second-generation system under development.
About Jimmy (Web):
Jimmy Lin is an associate professor in the iSchool at the University of Maryland, affiliated with the Department of Computer Science and the Institute for Advanced Computer Studies. He graduated with a Ph.D. in computer science from MIT in 2004. Lin’s research lies at the intersection of information retrieval and natural language processing, and he has done work in a variety of areas, including question answering, medical informatics, and bioinformatics. Lin’s current research focuses on massively-distributed data analytics in cluster-based environments.
Recently, Lin just completed an extended sabbatical at Twitter, where from 2010-2012 he worked on services designed to surface relevant content for users and the distributed infrastructure that supports mining relevance signals from massive amounts of data.