Reconhecimento de entidades nomeadas em textos informais no domínio legislativo
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
Named Entity Recognition (NER) is a challenging task in Natural Language Processing (NLP) for
a language as rich as Portuguese. When applied in a scenario appropriate to informal
language and short texts, the task acquires a new layer of complexity, manipulating a lexicon
specific to the domain in question. In this work, we expand the UlyssesNER-Br corpus for the
NER task with Brazilian Portuguese comments on bill projects. Additionally, we enriched the
annotated set with a formal corpus in order to analyze whether the combination of formal and
informal texts from the same domain could improve the performance of NER models. Finally,
we conducted experiments with a Conditional Random Fields (CRF) model, a Bidirectional
LSTM-CRF model (BiLSTM-CRF), and subsequently fine-tuned a BERT and RoBERTa language
model on the NER task with our dataset. We conclude that formal texts aided in identifying
entities in informal texts. The best model was the fine-tuning of BERT which achieved an F1-
score of 74.63%, surpassing the benchmark of related works.
Descrição
Citação
COSTA, R. P. Reconhecimento de entidades nomeadas em textos informais no domínio legislativo. 2023. 70 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2023.