Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas

Nenhuma Miniatura disponível

Data

2023-05-26

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

With the emergence of intelligent personal assistants, the need for high-quality conversational interfaces has increased. While text-based chatbots are popular, the development of voice interfaces is equally important. However, the primary method for evaluating voice-based conversational models is mainly done through Mean Opinion Score (MOS), which relies on a manual and subjective process. In this context, this thesis aims to contribute with a new methodology for evaluating voice-based conversational interfaces, with a case study specifically conducted in Brazilian Portuguese. The proposed methodology includes an architecture for predicting the quality of synthesized speech in Brazilian Portuguese, correlated with MOS. To evaluate the proposed methodology, this work included training Text-to-Speech models to create the dataset called BRSpeechMOS. Details about the creation of this dataset are presented, along with a qualitative and quantitative analysis of it. A series of experiments were conducted to train various architectures using the BRSpeechMOS dataset. The architectures used are based on supervised and self-supervised learning. The results obtained confirm the hypothesis raised that pre-trained models on voice processing tasks such as speaker verification and automatic speech recognition produce suitable acoustic representations for the task of predicting speech quality, contributing to the advancement of the state of the art in the development of evaluation methodologies for conversational models.

Descrição

Citação

OLIVEIRA, F. S. Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas. 2023. 129 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2023.