Master 2018 2019
Stages de la spécialité SAR
Speech transformation using cycle-consistent adversarial networks

Site : [OPEN][RI-HOME_2019-DM-VP-030] Speech transformation using cycle-consistent adversarial networks
Lieu : Technicolor, Rennes
Encadrant : Alexey Ozerov, Ngoc Duong, Gilles Puy
Dates :01/02/2019 au 31/07/2019
Rémunération :1200 euros/month (brut)
Mots-clés : Parcours ATIAM : Acoustique, Parcours ATIAM : Informatique musicale, Parcours ATIAM : Traitement du signal


Descriptif et missions du stage / Internship description In this internship we are addressing various speech transformation tasks targeting modifying a particular attribute in the speech signal. Potential attributes to be modified may include age, identity, gender, accent, mood, etc. At the beginning we will mostly concentrate on age modification (aging or de-aging), since this task is of a great interest for movie production. The proposed approaches will be based on deep learning methods and more specifically on cycle-consistent adversarial networks. We assume that some training data are provided. For example, if we would like to transform a speech of a 25 years old men so as it sounds as he is 60 years old, it is supposed that we have a training set of speech signals including two subsets : one of 25 years old speakers and another of 60 years old speakers. However, to greatly increase the range of the applications, we assume that the training dataset is “parallel-data-free”, i.e., the utterances pronounced in the two subsets are not necessarily the same, and thus they cannot be aligned on the phoneme level. The approach will consist in following steps : (1) extracting speech-specific parameters, (2) transforming them to the target ones using a cycle-consistent adversarial network, and (3) resynthesizing the resulting speech from the transformed parameters. Up to this processing the cycle-consistent adversarial network should be pre-trained on the available parallel-data-free training dataset.

Mots clés / Keywords : speech transformation, deep learning, machine learning, cycle-consistent adversarial networks

Thématiques associées / Associated thematics : Electronique et traitement du signal

Sous-thématique(s) / Sub-theme(s) : Machine / Deep Learning,Video Processing

Profil du candidat / Candidate profile : quatrieme ou cinquieme année de l’ecole d’ingénieur ou de l’université

Compétences attendues / Required skills : machine learning, audio processing, speech processing, Python or C++.

Anglais courant / Fluent english : Oui

Durée et période du stage / Internship duration and dates

Durée du stage / Internship duration : 6 mois

Date de début souhaitée (ou connue) / Estimated starting date : 2/1/2019

Date de fin prévisionnelle (si connue) / Estimated end date : 7/31/2019

Lieu du stage / Internship place : Rennes

Merci d’envoyer CV + LM à en indiquant la réf. RI-HOME_2019-DM-VP-030

Thanks to send your resume and cover letter to with the ref. RI-HOME_2019-DM-VP-030.


[1] ZHU, Jun-Yan, PARK, Taesung, ISOLA, Phillip, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint, 2017.

[2] KANEKO, Takuhiro et KAMEOKA, Hirokazu. Parallel-data-free voice conversion using cycle-consistent adversarial networks. arXiv preprint arXiv:1711.11293, 2017.

[3] M. Morise, F. Yokomori, and K. Ozawa, ``WORLD : a vocoder-based high-quality speech synthesis system for real-time applications,’’ IEICE transactions on information and systems, vol. E99-D, no. 7, pp. 1877-1884, 2016.