Reconhecimento de entidades nomeadas em textos informais no domínio legislativo

Nenhuma Miniatura disponível

Data

2023-04-19

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

Named Entity Recognition (NER) is a challenging task in Natural Language Processing (NLP) for a language as rich as Portuguese. When applied in a scenario appropriate to informal language and short texts, the task acquires a new layer of complexity, manipulating a lexicon specific to the domain in question. In this work, we expand the UlyssesNER-Br corpus for the NER task with Brazilian Portuguese comments on bill projects. Additionally, we enriched the annotated set with a formal corpus in order to analyze whether the combination of formal and informal texts from the same domain could improve the performance of NER models. Finally, we conducted experiments with a Conditional Random Fields (CRF) model, a Bidirectional LSTM-CRF model (BiLSTM-CRF), and subsequently fine-tuned a BERT and RoBERTa language model on the NER task with our dataset. We conclude that formal texts aided in identifying entities in informal texts. The best model was the fine-tuning of BERT which achieved an F1- score of 74.63%, surpassing the benchmark of related works.

Descrição

Citação

COSTA, R. P. Reconhecimento de entidades nomeadas em textos informais no domínio legislativo. 2023. 70 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2023.