Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo

Inuzuka, Marcelo Akira

Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo

Arquivos

Tese - Marcelo Akira Inuzuka - 2025.pdf (11.73 MB)

Data

2025-04-24

Autores

Inuzuka, Marcelo Akira

Editor

Universidade Federal de Goiás

Resumo

Corpus annotation is essential for training Natural Language Processing (NLP) models, yet it faces challenges such as high cognitive complexity, annotator inconsistency, and elevated costs. This thesis proposes task decomposition as a methodological strategy to modularize complex NLP processes, promoting greater conceptual clarity, scalability, and reproducibility. Initially focused on Argument Mapping, the research redirected its scope due to the infeasibility of the original task, concentrating on the identification of reusable patterns applicable to annotation and automation stages. Guidelines, a hierarchical decomposition algorithm, and artifacts such as annotated datasets and the Argmap platform — which supports collaborative annotation with quality control — were developed. The approach was validated through three empirical case studies: hashtag segmentation, keyphrase curation, and annotation of argumentative structures. Results demonstrate that decomposition improves consistency among agents (human or automatic), guideline clarity, and automation feasibility. The thesis also introduces the Recruiter–Selector architectural pattern, which structures tasks into two independent modules — candidate generation and final selection — applicable to both annotation workflows and algorithms based on Large Language Models (LLMs). It concludes that decomposition driven by reusable patterns enhances efficiency and reliability in corpus construction and the development of robust NLP systems, contributing to the systematization of annotation processes and their integration with automatic solutions

Palavras-chave

Anotação de corpus, Processamento de Linguagem Natural, Qualidade de dados, Padrões reutilizáveis, LLMs, Decomposição de tarefas, Corpus annotation, Natural Language Processing, Data quality, Reusable patterns, Task decomposition

Citação

INUZUKA, M. A. Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo. 2025. 293 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.

URI

https://repositorio.bc.ufg.br/tede/handle/tede/14460

Coleções

Doutorado em Ciência da Computação

Página do item completo

Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções