Controle de estilo na síntese de voz em português brasileiro usando redes neurais profundas

Nenhuma Miniatura disponível

Data

2021-08-26

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

The popularization of computer programs capable of emulating a dialogue between machines and people, known as chatbots, has driven the development of human-computer interface solutions. In this context, there is a relevant demand in the development of conversational voice interfaces that include at least the ability of the machine to understand words and synthesize voice. The use of Neural Networks has led to a new state of the art for speech synthesis. Mean Opinion Score(MOS) tests show that the speech synthesized by this method has a quality similar to speech recorded in studio by humans. Even with this quality, these methods have difficulty to reproduce the various ways of speaking the same text, to convey information that goes beyond the content, such as emotion, intensity, speed and emphasis. Therefore, new models have been developed to control the style of the generated speech and to transfer style from one audio segment to others. Despite these recent advances, the studies carried out are concentrated on the synthesis of texts in English or Mandarin. The application of style control methods to produce variations in Brazilian Portuguese is also scarce or non-existent. The research presented here developed a neural network architecture for speech synthesis in Brazilian Portuguese capable of controlling the style of synthesized speech. This control allows pitch and velocity changes. In MOS evaluation, the constructed model obtained 4.1 on a scale from 1(Poor) to 5(Excellent), validating the subjective evaluation of good quality in synthesized audios. Examples of audio generated by the developed models can be seen at shorturl.at/etFJP and https://mrfalante.com.br/sobre. Real-time synthesis using models resulting from this research can be performed at https://cybervox.ai.

Descrição

Citação

TUNNERMANN, Daniel. Controle de estilo na síntese de voz em português brasileiro usando redes neurais profundas. 2021. 50 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2021.