Join us for the 15th incarnation of the Recommender Stammtisch hosted by plista. The event will take place on Nov 13th, starting at 7pm. Please register here.
We will have two talks:
Torben Brodt: “Latest in large scale recommendation engines and machine learning” – Torben will present some latest developments in the field of large scale recommendation engines and machine learning from the last RecSys conference.
Sebastian Schelter: “Factorbird – a Parameter Server Approach to Distributed Matrix Factorization” – We present ‘Factorbird’, a prototype of a parameter server approach for factorizing large matrices with Stochastic Gradient Descent-based algorithms. We designed Factorbird to meet the following desiderata: (a) scalability to tall and wide matrices with dozens of billions of non-zeros, (b) extensibility to different kinds of models and loss functions as long as they can be optimized using Stochastic Gradient Descent (SGD), and (c) adaptability to both batch and streaming scenarios. Factorbird uses a parameter server in order to scale to models that exceed the memory of an individual machine, and employs lock-free Hogwild!-style learning with a special partitioning scheme to drastically reduce conflicting updates. We also discuss other aspects of the design of our system such as how to efficiently grid search for hyperparameters at scale. We present experiments of Factorbird on a matrix built from a subset of Twitter’s interaction graph, consisting of more than 38 billion non-zeros and about 200 million rows and columns, which is to the best of our knowledge the largest matrix on which factorization results have been reported in the literature.
It’s summer and we thought it’s a good idea having the next Stammtisch in a beer garden. Mohamed Sordo is in Berlin and will join us.
We’ll meet next wednesday, August 13th, 7pm at Prater Biergarten. We will tweet where exactly we sit, so follow us – @RecSysDE
We had an awesome Stammtisch last week with interesting talks and discussions. Thanks to Zalando for hosting the event.
For future references, here are the slides of the three talks:
Zalando will host the 13th Recommender Stammtisch on Thursday, June 26.
Join us starting from 8:00 pm (6:00 pm if you want to watch the match there) in the Zalando office in Mollstraße 1, 10178 Berlin.
We are looking forward to the following talks:
- Boris von Loesch (Zalando): Recommendations at Zalando
- Ulf Brefeld (TU Darmstadt): Sequential Recommender Systems
- Mikio Braun (streamdrill): What is scalable Machine Learning? — Scalability is one of the key concepts in Big Data, although historically speaking, it has meant quite a different thing in Machine Learning. In this talk, I’ll discuss different aspects of large scale learning and how it relates to map reduce, stochastic gradient descent, next generation big data frameworks like Spark and Stratosphere/Flink, as well as algorithmic approaches to scalability like approximation, and stream mining.
- Lukas Lerche (TU Dortmund): Recommendations for Short-Term Shopping Goals
- 6:00 – Join us for the Worldcup match
- 8:00 pm – Doors open
- 8:20 pm – Talks start
- Later we will have snacks and beers.
Register here for free.
We hope to see you there!
We are starting later this time so that you can get the best of two worlds: watch the Worldcup match Germany-USA at 6:00 pm and then have enough time to make it to the Stammtisch.
We watch this match together on a projector at the Zalando office before the Stammtisch! You are welcome to join us already there!
Thanks a lot to the awesome people at ResearchGate for organizing another great Stammtisch at their office in Berlin-Mitte.
The 2 talks were informative and well-received by the audience. In between we had pizza, and after the talks there was plenty of time for talking, table football and table tennis.
Earlier this year, we met at SoundCloud to attend 3 talks by Özgür, Alex, and Lucía.
For future references, here are their slides:
This May, the Recommender Stammtisch returns to ResearchGate. We’re looking forward to two exciting talks by Sebastian Schelter and Stefan Savev (please see details below), followed by a happy hour.
Join us on May 22nd at 7:00 pm at the ResearchGate headquarters on Invalidenstraße 115, 10115 Berlin (close to “Naturkundemuseum” on the U6 line and to “Nordbahnhof”).
Looking forward to seeing you there! Register for free!
|Sebastian Schelter: Cooccurrence-based recommendations with Mahout, Scala & Spark
This talk will give a preview to the latest developments in Apache Mahout. Mahout features a new scala DSL for linear algebraic computations. Programs written in this DSL will be automatically parallelized and executed on Apache Spark. I will give an introduction to the DSL and show how Mahout uses it to implement a cooccurrence-based recommender.
Sebastian is a PhD student at the Database Systems and Information Management Group of TU Berlin as well as a member of the Apache Software Foundation, where he works on Mahout and Giraph.
|Stefan Savev: Information Retrieval Models: Explanations and Applications
Did you ever come across terms like BM25 similarity model, KL divergence model, the language model, and the translation model in ElasticSearch or Solr documentation? Did you wonder what these models do and whether they can improve your product?
In this talk we revisit the classical information retrieval approaches and explain the thinking and intuition behind the models so that you can decide whether they fit your use case. We also show some useful extensions handling common cases such as matching the query across multiple fields, handling spelling errors, synonyms and personalized context.
Stefan works as a Senior Software Engineer at ResearchGate, where he focuses on developing their recommendation system. Previously, Stefan worked as a Software Engineer for relevance in Microsoft. He holds a PhD in Information Retrieval from Northeastern University, Boston and a MSc degree in Natural Language Processing from RWTH, Aachen. Stefan loves to discuss topics related to algorithms, search engine implementation, functional programming languages and machine learning.
Followed by happy-hour
We’re back at SoundCloud. SoundCloud will host the next Recommender Stammtisch on Thursday, February 27th.
We’re planning to have an evening of mirth, pizza, RecSys talks, and more mirth. Doors open at 7:00 pm at SoundClouds offices at Greifswalder Str 212, 5th floor, Berlin.
We will have talks from:
- Özgür Demir – SoundCloud – Recommendations at SoundCloud
Since its foundation SoundCloud has become one of the major platforms for user generated audio content. The uniqueness of the uploaded content together with its sheer mass makes it very difficult for the enduser to find relevant content. Hence, fully automated recommendations become a crucial part of an outstanding user experience. At SoundCloud currently various projects deal with personalized as well as unpersonalized recommendations and its related topics e.g. content classification. This talk will give a brief overview about those projects and the used technologies.
- Alexandros Karatzoglou – Telefonica Research – Ranking and Diversity in Recommendations (also meet Okapi)
Most users only pay attention to the top 5 to 10 recommendations (on Mobile domains even less) it is thus very important to get these recommendations right. Ranking algorithms can help achieve this by using most of the modelling power to get the most relevant items at the top of the recommendation list.
I will give a short overview of the ranking techniques that we developed the last couple of years and the main idea behind them. Recommendations should also be interesting and potentially allow users to discover new content and perhaps even expand his/her preferences.
In the second part of the talk, I will focus on Diversifying recommendations, the challenges and the ways we tackle them. I would also like to introduce a new Open Source project for Machine Learning and Recommendations in Giraph/Hadoop called Okapi. Okapi provides a range of methods for Collaborative Filtering and Social Network Analysis and is released under the Apache license.
- Christoph Lingg – komoot – Recommender Use Case at komoot
komoot is your personal guide for cycling and hiking tours. Cristoph Lingg will give a short introduction about recommender use cases at komoot and their current recommendation techniques.
Please register for this event (it is free).
ResearchGate will host the next Recommender Stammtisch on Thursday, November 14th.
We are looking forward to an exciting talk by Andreas Lommatzsch on online real-time recommendations (details below), followed by a happy hour.
Doors open at 7:30 pm at ResearchGate’s offices at Invalidenstraße 115, 10115 Berlin (close to Naturkundemuseum on the U6 line and to Nordbahnhof). Our Recommender Stammtisch will be following the first day of Stratosphere Summit, if you are there, come on by and join us afterwards.
Looking forward to seeing you there!
7:30 pm Doors open
7:45 pm Welcome words by Ijad Madisch, CEO of ResearchGate
8:00 pm Andreas Lommatzsch: Online News Recommendations in the NRS Challenge
Andreas will give a gentle introduction to the standard toolkit of recommender algorithms and will present solutions how to adapt them to the requirements of online recommendations in streams.
Andreas Lommatzsch is a postdoc at the Distributed Artificial Intelligence Laboratory, where he focuses on hybrid recommenders for heterogeneous semantic data. He is also one of the heads behind the ACM RecSys 2013 News Recommender System Challenge (NRS).
Followed by happy hour.
Please register at Eventbrite for this event (it is free).
Did you already register for the prater garden? “Back to the roots”
We have another special announcement: Jimmy Lin (@lintool) is in Germany and he will give a talk about:
“WTF: the who to follow service at Twitter”
Date: Wednesday, July 24th, 17:30
Location: plista, Torstraße 33, 10119 Berlin, Deutschland
Right after the presentation we will walk together to the Prater Garden to get some more drinks.
WTF (“Who to Follow”) is Twitter’s user recommendation service, which is responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors. This paper provides an architectural overview and shares lessons we learned in building and running the service over the past few years. Particularly noteworthy was our design decision to process the entire Twitter graph in memory on a single server, which significantly reduced architectural complexity and allowed us to develop and deploy the service in only a few months. At the core of our architecture is Cassovary, an open-source in-memory graph processing engine we built from scratch for WTF. Besides powering Twitter’s user recommendations, Cassovary is also used for search, discovery, promoted products, and other services as well. We describe and evaluate a few graph recommendation algorithms implemented in Cassovary, including a novel approach based on a combination of random walks and SALSA. Looking into the future, we revisit the design of our architecture and comment on its limitations, which are presently being addressed in a second-generation system under development.
About Jimmy (Web):
Jimmy Lin is an associate professor in the iSchool at the University of Maryland, affiliated with the Department of Computer Science and the Institute for Advanced Computer Studies. He graduated with a Ph.D. in computer science from MIT in 2004. Lin’s research lies at the intersection of information retrieval and natural language processing, and he has done work in a variety of areas, including question answering, medical informatics, and bioinformatics. Lin’s current research focuses on massively-distributed data analytics in cluster-based environments.
Recently, Lin just completed an extended sabbatical at Twitter, where from 2010-2012 he worked on services designed to surface relevant content for users and the distributed infrastructure that supports mining relevance signals from massive amounts of data.