Master 2018 2019
Stages de la spécialité SAR
Hierarchical variational temporal learning for dynamic musical audio synthesis

Lieu : IRCAM, Equipe Représentations Musicales
Encadrant : Philippe Esling, Axel Chemla--Romeu-Santos
Dates :18/02/2019 au 18/08/2018
Rémunération :Tarif en vigueur IRCAM
Mots-clés : Parcours ATIAM : Informatique musicale

Cliquer ici pour vous authentifier


Generative systems are machine-learning models whose training is based on two simultaneous optimization tasks. The first is to build a latent space, that provides a low-dimensional representation of the data, eventually subject to various regularizations and constraints. The second is the reconstruction of the original data through the sampling of this latent space. These systems are very promising because their space is a high-level, "over-compressed" representation that can be used as an intermediate space for several tasks, such as visualization, measurements, or classification.

However, one of the most prevalent problem of ML algorithms applied to musical creativity is that they only process a single temporal scale or at best a finite set of small scales. The goal of this project is to work on an approach able to process multiple temporal granularities through a hierarchical multi-scale processing. Hence, the main goal will be to develop a recursive form of learning by iteratively learning increasingly temporally complex signals. A first approach towards this idea is to first learn a variational latent space on small chunks of audio (or audio grains) directly from the raw audio and iteratively build latent spaces for more long and complex audio samples.

The goal of this internship will be to both provide a generative system based on raw audio, but also to evaluate its use and control for musical creativity.


[1] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. [2] Lei Cai, Hongyang Gao, and Shuiwang Ji, “Multi-stage variational auto-encoders for coarse-to-fine image generation,” arXiv preprint arXiv:1705.07202, 2017. [3] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko, “Semi- supervised learning with ladder networks,” in Advances in Neural Information Processing Systems, 2015, pp. 3546–3554.18 [4] Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther, “Ladder variational autoencoders,” in Advances in neural information processing systems, 2016, pp. 3738–3746.