A content here is restricted to Premium users, please login or subscribe to read or play it.
News from the Analysis/Synthesis team
Frederik Bous, Guillaume Doras, Clément Le Moine Veillon, Nicolas Obin, Axel Roebel
This session will present recent results of the Analysis/Synthesis team. The session will start with a short presentation of the new features of the Version 1.3.0 of the Ircam Singing Synthesis Software ISiS and will continue with more prospective results demonstrating the use of Deep Neural Networks for singing and spoken voice manipulation using the mel spectrogram as a parametric speech represetnation.
ISiS Version 1.3.0 (Guillaume Doras)
Accessing and manipulating the intermediate singing voice parameters representation.
Neural Vocoder (Axel Roebel)
Multi-band Excited WaveNet vocoder for resynthesis of spoken and singing voice from mel spectrograms.
Pitch Manipulation (Frederik Bous)
A deep auto-encoder with bottleneck that disentangles the F0 from the mel spectrogram.
Manipulation of perceived speech attitude (Clément Le Moine Veillon)
A deep neural network for manipulating the perceived attitude in speech.
Voice Identity transformation (Nicolas Obin)
Sequence to sequence model transforming speaker identity and preserving intonation.
Axel Roebel is research director at IRCAM and head of the Sound Analysis-Synthesis team (AS). He received the Diploma in electrical engineering from Hannover University in 1990 and the Ph.D. degree (summa cum laude) in computer science from the Technical University of Berlin in 1993. In 1994 he joined the German National Research Center for Information Technology (GMD-First) in Berlin where he continued his research on using artificial neural networks for modeling of time series of nonlinear dynamical systems. In 1996 he became assistant professor for digital signal processing in the communication science department of the Technical University of Berlin.
In 2000, he obtained a research scholarship at CCRMA, Standford University, where he started an investigation into adaptive sinusoidal modeling. In 2000 he joined the Sound Analysis-Synthesis team of IRCAM where he obtained his Habilitation from the Sorbonne Université in 2011 and where became Research Director in 2013. He has developed state-of-the-art speech and music analysis and transformation algorithms, is the author of numerous libraries for signal analysis, synthesis, and transformation as for example SuperVP, a software for music and speech signal analysis and transformation that has been integrated into numerous professional audio tools. He has recently started to investigate signal processing algorithms based on deep learning.
Nicolas OBIN is currently associate professor at the Faculty of Sciences and Engineering of Sorbonne Université and research scientist in the Sound Analysis and Synthesis team at the Science and Technology for Sound and Music laboratory (Ircam, CNRS, Sorbonne Université). I received a PhD. thesis in computer sciences on the modeling of speech prosody and speaking style for text-to-speech synthesis (2011) for which he obtained the best PhD thesis award from La Fondation Des Treilles in 2011. I am deeply passionated about sound, waves, oscillations, vibration as well as the theory of information and communication. Through the years I have developed a strong interest in the behavior and communication between humans, animals, and robots.
My research activities cover audio signal processing, artificial intelligence, machine learning, and statistical modeling of sound signals with specialization on speech processing and human communication. My main area of research is the structured generative modeling of complex human productions such as speech, singing and music with various applications in speech synthesis and transformation, multi-modal virtual agent animation, and humanoid robotics. Lately I initiated activities in the fields of bioacoustics and sound ecology.
As part of my artistic commitment to Ircam, I a actively promoting digital science and technology for arts, culture, and heritage, and collaborated with renowned musicians and artists, such as: Eric Rohmer, Philippe Parreno, Roman Polansky, Leos Carrax, or George Aperghis.
Guillaume Doras received his MSc. (2016) in acoustics and signal processing applied to music at Ircam/TelecomParis, and received his Ph.D (2020) in music informatics from the Sorbonne Université, with a research focused on automatic cover detection with deep learning. He is currently working as a researcher at Ircam in Paris. His research interests are related to audio and deep learning applications, in particular singing voice recognition and synthesis problems. Before that, he has been leading technical projects in the telecom industry for 15 years, living and working in various countries and continents.
Clément Le Moine Veillon
Clément Le Moine Veillon is a PhD student at IRCAM in the Sound Analysis/Synthesis team. He is a former student of the ATIAM (Acoustics, Signal Processing and Computer Science Applied to Music) Master's program at IRCAM, and became familiar with the problems inherent to the expressiveness of the human voice during an internship in the team. His current work focuses on generative modeling of vocal attitudes based on deep learning and integrating subjective perceptual criteria.
Frederik Bous is a PhD Student at Sorbonne University and a laureate of EDITE’s PhD scholarship working at IRCAM in the Sound Analysis-Synthesis team.
In his PhD with the topic parametric speech synthesis with deep neural networks, he investigates meaningful parametric representations for speech that allow transformation and synthesis of the human voice. The PhD extends his research carried out for his Master thesis where he worked on singing synthesis.