Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas

Oliveira, Frederico Santos de

Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas

Arquivos

Tese - Frederico Santos de Oliveira - 2023.pdf (7.38 MB)

Data

2023-05-26

Autores

Oliveira, Frederico Santos de

Editor

Universidade Federal de Goiás

Resumo

With the emergence of intelligent personal assistants, the need for high-quality conversational interfaces has increased. While text-based chatbots are popular, the development of voice interfaces is equally important. However, the primary method for evaluating voice-based conversational models is mainly done through Mean Opinion Score (MOS), which relies on a manual and subjective process. In this context, this thesis aims to contribute with a new methodology for evaluating voice-based conversational interfaces, with a case study specifically conducted in Brazilian Portuguese. The proposed methodology includes an architecture for predicting the quality of synthesized speech in Brazilian Portuguese, correlated with MOS. To evaluate the proposed methodology, this work included training Text-to-Speech models to create the dataset called BRSpeechMOS. Details about the creation of this dataset are presented, along with a qualitative and quantitative analysis of it. A series of experiments were conducted to train various architectures using the BRSpeechMOS dataset. The architectures used are based on supervised and self-supervised learning. The results obtained confirm the hypothesis raised that pre-trained models on voice processing tasks such as speaker verification and automatic speech recognition produce suitable acoustic representations for the task of predicting speech quality, contributing to the advancement of the state of the art in the development of evaluation methodologies for conversational models.

Palavras-chave

Avaliação da fala , Avaliação da fala sintetizada , Predição de MOS , Redes neurais profundas , Predição da qualidade , Speech assessment , Synthesized speech assessment , MOS prediction , Deep neural networks , Quality prediction

Citação

OLIVEIRA, F. S. Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas. 2023. 129 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2023.

URI

http://repositorio.bc.ufg.br/tede/handle/tede/12916

Coleções

Doutorado em Ciência da Computação

Página do item completo

Avaliação da qualidade da sintetização de fala gerada por modelos de redes neurais profundas

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções