Master 2017 2018
Stages de la spécialité SAR
Automatic Drum Transcription using Deep Neural Networks

Site : Analysis/Synthesis Team, IRCAM
Lieu : IRCAM 1, place Igor-Stravinsky, 75004 Paris
Encadrant : Celine Jacques/Axel Roebel
Dates :01/02/2018 au 30/06/2018
Rémunération : 600€ € / month + benefits (tickets RATP and ticket resto)
Mots-clés : Parcours ATIAM : Traitement du signal


==Context== The Analysis/Synthesis team of IRCAM has a long history in studying automatic music transcription algorithms covering transcription of tonal [Yeh 2010] and percussive [Roebel 2015] instruments. The transcription of percussive instruments, generally denoted Automatic Drum Transcription (ADT), addresses the task to establish automatic annotation of polyphonic music with drumbeat positions and drum type labels.

Current research activities cover multi pitch estimation using convolutional neural networks as well as drum and instrument transcription with non-negative matrix deconvolution. In this context the internship aims to investigate into recently emerging ADT algorithms based on convolutional and recurrent neural networks [Southall 2017, Vogl 2017, Wu 2017].


The selected student will perform a literature study into current trends of ADT with neural networks, propose new network structures, create implementations of the proposed networks using for example the Tensorflow [TF 2017], create appropriate training databases using public datasets of drum annotated music, train the networks on the GPU cluster of the Analysis/Synthesis team and evaluate the performance in comparing with existing algorithms. A major problem for training drum transcription algorithms is the availability of sufficient training data. The internship aims to investigate into new means to create appropriate synthetic annotated datasets for increasing the size of training datasets.


[Wu 2017] C.-W. Wu and A. Lerch, « Automatic Drum Transcrption using the student-teacher paradigm with unlabeled music data », Proc. Inter. Sym on Music Information Retrieval (ISMIR), pp. 613-620.

[Southall 2017] C. Southall and R. Stables and J. Hockman (2017). « Automatic Drum Transcription for Polyphonic Recordings using Soft Attention Mechanisms and Convolutional Neural Networks », Proc. Inter. Sym on Music Information Retrieval (ISMIR), pp. 606-612.

[Vogl 2017] M. Vogl et al. (2017). « Drum transcription from polyphonic music with recurrent Nzural Networks », Proc Int. Conf. Acoustics Speech and Signal Proc (ICASSP).

[TF 2017]

[Roebel 2015] A. Roebel and J.P. Puig et al (2015), « On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence », Int Conf. ASSP, pp. 414-418.

[Yeh 2010] C. Yeh, A. Roebel X. Rodet (2010). « Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals », IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No 6, pp. 1116-1126