Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo

Carregando...
Imagem de Miniatura

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

Corpus annotation is essential for training Natural Language Processing (NLP) models, yet it faces challenges such as high cognitive complexity, annotator inconsistency, and elevated costs. This thesis proposes task decomposition as a methodological strategy to modularize complex NLP processes, promoting greater conceptual clarity, scalability, and reproducibility. Initially focused on Argument Mapping, the research redirected its scope due to the infeasibility of the original task, concentrating on the identification of reusable patterns applicable to annotation and automation stages. Guidelines, a hierarchical decomposition algorithm, and artifacts such as annotated datasets and the Argmap platform — which supports collaborative annotation with quality control — were developed. The approach was validated through three empirical case studies: hashtag segmentation, keyphrase curation, and annotation of argumentative structures. Results demonstrate that decomposition improves consistency among agents (human or automatic), guideline clarity, and automation feasibility. The thesis also introduces the Recruiter–Selector architectural pattern, which structures tasks into two independent modules — candidate generation and final selection — applicable to both annotation workflows and algorithms based on Large Language Models (LLMs). It concludes that decomposition driven by reusable patterns enhances efficiency and reliability in corpus construction and the development of robust NLP systems, contributing to the systematization of annotation processes and their integration with automatic solutions

Descrição

Citação

INUZUKA, M. A. Decomposição de tarefas para problemas de linguagem natural: segmentação de hashtags e anotação de texto argumentativo. 2025. 293 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.