Master 2018 2019
Stages de la spécialité SAR
Automatic Piano Transcription using Deep Neural Networks

Site : Automatic Piano Transcription using Deep Neural Networks
Lieu : Equipe Analyse/Synthèse IRCAM, 1, place Igor-Stravinsky, 75004 Paris
Encadrant : Axel Roebel, Remi Mignot
Dates :du 01/02/2019 au 30/06/2019
Rémunération : 600€ / month + benefits (tickets RATP and ticket resto)
Mots-clés : Parcours ATIAM : Traitement du signal


Context :

The Analysis/Synthesis team of IRCAM has a long history in studying automatic music transcription algorithms covering transcription of tonal [Yeh 2010] and percussive [Roebel 2015, Jacques 2018] instruments. Recent research activities concerned with transcription of polyphonic piano music are strongly dominated by algorithms based on deep learning [Sigtia 2016, Wang 2017, Hawthorne 2018]. A central precondition for successful application of deep learning methods for transcription are sufficiently large, reliably annotated datasets. For piano transcription the dataset that is nearly exclusively used is the MAPS database [Emiya 2010]. Unfortunately, due to weaknesses of the DisKlavier that was used for generation of this dataset, the ground truth contains significant offsets and missing notes.

Objectives :

The aim of the present internship is to develop an automatic approach to generating precisely annotated piano transcription datasets using DisKlaviers. The method will use appropriate splitting of the midi tracks such that the major weaknesses of existing DisKlaviers can either be avoided or compensated for in a post processing stage. The methods will be practically experimented using the DisKlavier available at IRCAM, The generated datasets will be used to evaluate the impact of the deficiencies of the MAPS database on state of the art piano transcription methods. All implementations will be performed using python and the Tensorflow framework [TF 2017]. The networks will be trained on the GPU cluster of the Analysis/Synthesis team.

The implementation of the tools for automated database creation that will be realized during the internship should be made publicly available with the aim to trigger a community effort towards the construct of a new reference piano database built entirely with DisKlaviers available in the various research teams interested in piano transcription.


[Emiya 2010], V. Emiya, et al., “MAPS - A piano database for multipitch estimation and automatic transcription of music”, Inria Research Report,, 2010.

[Hawthorne 2018] C. Hawthorne et al., “Onsets and frames : dual objective piano transcription”, Proc ISMIR 2018.

[Jacques 2018] C. Jacques and A. Roebel, “Automatic drum transcription with convolutional neural networks”, Proc Int. Conf on Digital Audio Effects (DAFx), pp.80-86, 2018.

[Roebel 2015] A. Roebel and J.P. Puig et al (2015), « On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence », Int Conf. ASSP, pp. 414-418.

[Sigtia 2016] S. Sigtia and E. Benetos and S. Dixon, “An End-to-End Neural Network for Polyphonic Piano Music Transcription”, IEEE/ACM Transactions on Audio Speech and Language Processing, vol 24, no. 5, pp. 927 – 939, 2016.

[TF 2017]

[Wang 2017] Q. Wang and R. Zhou and Y. Yan, ”A two-stage approach to note-level transcription of a specific piano”, Applied Sciences,, vol 7, no. 9, pp. 901, 2017.

[Yeh 2010] C. Yeh, A. Roebel X. Rodet (2010). « Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals », IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No 6, pp. 1116-1126