Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico

Nenhuma Miniatura disponível

Data

2019-12-05

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

Named Entity Recognition (NER) is a challenging Natural Language Processing task for a language as rich as Portuguese. When applied to a specific domain, the task acquires a new layer of complexity, handling a lexicon particular to the domain in question. In this work, it is studied the Legal domain, targeting specifically the Brazilian Labor Law. Architectures based on Deep Learning, with word representations based on static word embeddings and language models have shown state-of-the-art performance for the NER task. In this work it is used a model based on Deep Neural Networks, evaluating different forms of word representations. The evaluated models are applied to Portuguese language, for both Legal and general domains. To this end, language models based on the ELMo architecture were trained for both domains, as well as static word embeddings, specific for the Legal domain. In this work, it is verified the best type of pre-trained word embeddings for each domain, after performing a comparative study between the types of word embeddings applied to the NER task. For the training of the Legal domain NER models, ELMo and static word embeddings, two different corpora were produced and annotated, based on a collection of public documents from the Brazilian Labor Court. For the Portuguese general domain NER model, a new state-of-the-art result was achieved for the HAREM benchmark, with 83.22% F-Score for the selective scenario, and 78.04% for the total scenario. For the Brazilian Labor Law domain, a model with 93.81% F-Score was obtained.

Descrição

Citação

CASTRO, P. V. Q. Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico. 2019. 125 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Goiás, Goiânia, 2019.