ProsoDeep

Deep understanding and modelling of the hierarchical structure of Prosody

The ProsoDeep project seeks to gain a deeper understanding of the hierarchical structure of the language of prosody through the utilisation of deep models. The results will facilitate the advancement of speech technologies that rely both on the synthesis of prosody, e.g. text-to-speech (TTS) systems, and its analysis, e.g. speech recognition and speech emotion recognition (SER).

The models

The different models developed within the ProsoDeep project are based on the Superposition of Functional Contours (SFC) model, which is a top-down approach based on the decomposition of prosodic contours into functionally relevant elementary contours, named also prosodic prototypes or clichés [sfc]. They include the:

PySFC model — a Python implementation of the original SFC model [pysfc],
Weighted SFC (WSFC) model — that incorporates the modelling of prominence of the extracted prosodic prototypes [wsfc],
Variational Prosody Model (VPM) — that models the linguistic conext specific variability of the prosodic prototypes [vpm], and
Variational Recurrent Prosody Model (VRPM) — that decouples the context specific variability from function scope [vrpm].

Code

The code implementation for all of the models is available as free software under a GNU General Public License v3 on GitHub https://github.com/gerazov/prosodeep

Instructions on its use and the various parameters will be made available soon on Read the Docs at https://prosodeep.readthedocs.io/

The PySFC implementation can be found at https://github.com/gerazov/pysfc

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions (MSCA) grant agreement No 745802: “ProsoDeep: Deep understanding and modelling of the hierarchical structure of Prosody”.