Reconhecimento de entidades nomeadas em textos informais no domínio legislativo
Nenhuma Miniatura disponível
Data
2023-04-19
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
Named Entity Recognition (NER) is a challenging task in Natural Language Processing (NLP) for
a language as rich as Portuguese. When applied in a scenario appropriate to informal
language and short texts, the task acquires a new layer of complexity, manipulating a lexicon
specific to the domain in question. In this work, we expand the UlyssesNER-Br corpus for the
NER task with Brazilian Portuguese comments on bill projects. Additionally, we enriched the
annotated set with a formal corpus in order to analyze whether the combination of formal and
informal texts from the same domain could improve the performance of NER models. Finally,
we conducted experiments with a Conditional Random Fields (CRF) model, a Bidirectional
LSTM-CRF model (BiLSTM-CRF), and subsequently fine-tuned a BERT and RoBERTa language
model on the NER task with our dataset. We conclude that formal texts aided in identifying
entities in informal texts. The best model was the fine-tuning of BERT which achieved an F1-
score of 74.63%, surpassing the benchmark of related works.
Descrição
Citação
COSTA, R. P. Reconhecimento de entidades nomeadas em textos informais no domínio legislativo. 2023. 70 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2023.