Modelo baseado em redes neurais profundas com unidades recorrentes bloqueadas para legendagem de imagens por referências

Nogueira, Tiago do Carmo

Modelo baseado em redes neurais profundas com unidades recorrentes bloqueadas para legendagem de imagens por referências

Arquivos

Tese - Tiago do Carmo Nogueira - 2020.pdf (7.41 MB)

Data

2020-09-28

Autores

Nogueira, Tiago do Carmo

Editor

Universidade Federal de Goiás

Resumo

Describing images using natural language has become a challenging task for computer vision. Image captioning can automatically create descriptions through deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Image captioning has several applications, such as object descriptions in scenes to help blind people walk in unknown environments, and medical image descriptions for early diagnosis of diseases. However, architectures supported by traditional RNNs, in addition to problems of exploding and fading gradients, can generate non-descriptive sentences. To solve these difficulties, this study proposes a model based on the encoder-decoder structure using CNNs to extract the image characteristics and multimodal gated recurrent units (GRU) to generate the descriptions. The part-of-speech (PoS) and the likelihood function are used to generate weights in the GRU. The proposed method performs knowledge transfer in the validation phase using the k-nearest neighbors (kNN) technique. The experimental results in the Flickr30k and MS-COCO data sets demonstrate that the proposed PoS-based model is statistically superior to the leading models. It provides more descriptive captions that are similar to the expected captions, both in the predicted and kNN-selected captions. These results indicate an automatic improvement of the image descriptions, benefitting several applications, such as medical image captioning for early diagnosis of diseases.

Palavras-chave

Aprendizado profundo , Rede neural convolucional , Unidades recorrentes bloqueadas , Legendagem de imagens , Parte do discurso , Verossimilhança , Deep learning , Convolutional neural network , Gated recurrent units , Image captioning , Part-of-speech , Likelihood

Citação

NOGUEIRA, T. C. Modelo baseado em redes neurais profundas com unidades recorrentes bloqueadas para legendagem de imagens por referências. 2020. 122 f. Tese (Doutorado em Engenharia Elétrica e da Computação) - Universidade Federal de Goiás, Goiânia, 2020.

URI

http://repositorio.bc.ufg.br/tede/handle/tede/10884

Coleções

Doutorado em Engenharia Elétrica e da Computação (EMC)

Página do item completo

Modelo baseado em redes neurais profundas com unidades recorrentes bloqueadas para legendagem de imagens por referências

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções