Master 2017 2018
Stages de la spécialité SAR
Learning Audio Representation from Usage Data

Site : Learning Audio Representation from Usage Data
Lieu : Deezer HQ, Paris
Encadrant : Viet-Anh Tran & Jimena Royo-Letelier
Dates :du 1/03/2018 au 30/08/2018
Rémunération :1300€/month (gross)
Mots-clés : Parcours ATIAM : Informatique musicale, Parcours ATIAM : Traitement du signal


Learning Audio Representation from Usage Data

Deezer is one of the leading companies in the musical streaming industry, with one of the largest catalog on the market (over 35 Million titles available) and with over 30 million active users spread in more than 180 countries. In this industry, recommendation is one of the key components to retain and attract new users. It helps them to actively explore the vast and mostly unknown musical landscape. It is also central to all enjoyable passive experience relying on generated and personalized content.

Traditional music recommendation engines are based on Collaborative Filtering (CF), the most successful approach to a wide variety of recommendation tasks — including not only music, but also games, books, movies, etc… Systems based on collaborative filters exploit the “wisdom of crowds” to infer usage-based similarities between items, and recommend new items to users by representing and comparing these items in terms of the people who use them. While CF techniques perform well when usage data is available for each item, they suffer from the “cold start” problem : a new item cannot be recommended until it has been consumed and it is less likely to be consumed if it is never recommended, especially in our case where the catalog has millions of titles. Thus, only a tiny fraction of songs may be recommended, making it difficult for users to explore and discover new music. In order to tackle this challenge, one can rely on similarity measures of audio content representations (low dimensional features that contain high-level properties of audio signal related to its musical content), which naturally extends to novel items.

This internship will focus in the exploration of new techniques for learning audio content representation from samples of usage data, including both theoretical aspects and practical applications. To this end, several approaches would be studied such as :

  • Mapping function : from one representation to another
  • Metrics learning : “similarity function” in one representation is learned from samples in another representation
  • Shared representation learning of both modalities

The intern will be supervised by research scientists and engineers from the R&D team at Deezer. They will provide material and theoretical help for the proposed task. They will also provide cutting edge technology and appropriate calculus power. The intern will be encouraged to propose solutions and work in autonomy.


Master student with strong background in machine learning and development experience.


Strong machine learning knowledge Knowledge of recommender system and music information retrieval is a plus Confidence with Python language Curiosity, autonomy and motivation


Y. Hu, C. Volinsky et Y. Koren, Collaborative filtering for implicit feedback datasets, ICDM Proceedings, 2008.

S. Rendle, C. Freudenthaler, Z. Gantner et L. Schmidt-Thieme, BPR : Bayesian personalized ranking from implicit feedback, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.

A van den Oord, S. Dieleman et B. Schrauwen, Deep content-based music recommendation, NIPS Proceedings, 2013.

S. Oramas, O. Nieto, M. Sordo et X. Serra, A Deep Multimodal Approach for Cold-start Music Recommendation, RecSys Conference, 2017.

B. McFee, L. Barrington et G. Lanckriet, Learning content similarity for music recommendation, ISMIR, 2007.

F. Schroff, D. Kalenichenko, J. Philbin, FaceNet : A Unified Embedding for Face Recognition and Clustering, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015.

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee et A. Y. Ng, Multimodal Deep Learning, in Proceedings of ICML, 2011.