Master 2014 2015
Stages de la spécialité SAR
Online Audio Pre-Clustering

Site :Mutant
Lieu :Ircam (Paris)
Encadrant : Arshia Cont (EPC MuTant, Ircam), Mathieu Lagrange (IRCCYN, Ecole Centrale de Nantes)
Dates :07/04/2015 to 31/08/2015
Rémunération :430 euros / mois
Mots-clés : Parcours ATIAM : Traitement du signal


Context :

Audio and music signals are highly structured and in most cases consist of recurring events and such recurrence constitute the structure of the audio in question. Discovery of such structures is a common task among trained or untrained listeners whereas automatic discovery of audio structures remain a challenging task for Music Information Retrieval. Such tasks are even more challenging for computers if done on-line (or in real-time as the audio arrives into the system), whereas it is given to human listeners.

The very front-end of such systems consist of segmentation or clustering algorithms for audio and music. In this approach audio signals arrive into the system in small frames and segmented into greater continuous chunks describing homogeneous sections in terms of audio content, and ideally connecting segments in different time-indexes through similarity. For example, the work in [3] attempts to detect homogeneous chunks in incoming audio using a change detection mechanism and in a second pass in [2] classifies chunks based on their similarities using methods of information geometry.

The goal of this Masters Thesis project is to study and address the problem of segmentation and classification of audio streams in a single pass by attempting to translate the change detection step in [2] and the classification attempt in [3] within a unified framework employing Hidden Semi-Markov formalisms [4]. The idea is to consider equivalence between classes and states, add states upon detection of new events and in parallel check the equivalence of incoming class to existing classes.

This Masters project will be undertaken in two steps :

(a) Formalize an experimental protocol where to above described methods can be evaluated independently of the model choices in a task called pre-clustering.

(b) Study the properties of several online algorithms and their corresponding Semi Markovian versions. This step will build upon work and code provided by Alberto Bietti [1]. Based on those results, new research directions will be taken, targeting in particular the use of non Gaussian priors for the duration of the events.

A major goal of this project is to use generic representational front-end that do not limit the use of the system to certain type of sound or music, and to focus on the robustness of the learning algorithms in question. The success of each step of the project is evaluated on various music databases available at Ircam, MIREX communities and considering the datasets of the recent IEEE Detection and Classification of Acoustic Scenes and Events challenge1. We expect the results to be published in major Music Information Retrieval and Machine Learning conferences.


[1] Alberto Bietti Online learning for audio clustering and segmentation ( ATIAM Master Thesis, UPMC / MuTant, 2014. [2] Lostanlen Vincent. Découverte automatique de structures musicales en temps réel par la géométrie de l’information. ATIAM Master Thesis, UPMC / MuTant, 2013.

[3] Dessein Arnaud, Cont A. : An information-geometric approach to real-time audio segmentation. In IEEE Signal Processing Letters, 20 (4) pp. 331-332, 2013.

[4] Yu, S.-Z. (2010). Hidden semi-markov models. Artif. Intell., 174(2) :215–243.