2025-09-172025-09-172025-06-10GOMES, J. R. S. Verificação semi-automática de fatos em português: enriquecimento de corpus via busca e extração de alegação. 2025. 119 f. Dissertação (Mestrado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.https://repositorio.bc.ufg.br/tede/handle/tede/14696The accelerated dissemination of disinformation often outpaces the capacity for manual fact-checking, highlighting the urgent need for Semi-Automated Fact-Checking (SAFC) systems. Within the Portuguese language context, there is a noted scarcity of publicly available datasets (corpora) that integrate external evidence, an essential component for developing robust AFC systems, as many existing resources focus solely on classification based on intrinsic text features. This dissertation addresses this gap by developing, applying, and analyzing a methodology to enrich Portuguese news corpora (Fake.Br, COVID19.BR, MuMiN-PT) with external evidence. The approach simulates a user’s verification process, employing Large Language Models (LLMs, specifically Gemini 1.5 Flash) to extract the main claim from texts and search engine APIs (Google Search API, Google FactCheck Claims Search API) to retrieve relevant external documents (evidence). Additionally, a data validation and preprocessing framework, including near-duplicate detection, is introduced to enhance the quality of the base corpora. The main results demonstrate the methodology’s viability, providing enriched corpora and analyses that confirm the utility of claim extraction, the influence of original data characteristics on the process, and the positive impact of enrichment on the performance of classification models (Bertimbau and Gemini 1.5 Flash), especially with fine-tuning. This work contributes valuable resources and insights for advancing SAFC in Portuguese.Acesso Abertohttp://creativecommons.org/licenses/by-nc-nd/4.0/Processamento de linguagem naturalFake NewsVerificação semi-automática defFatosCorpora em portuguêsNatural languagep processingSemi-automated fact-checkingPortuguese corporaCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOVerificação semi-automática de fatos em português: enriquecimento de corpus via busca e extração de alegaçãoSemi-automated fact-checking in portuguese: corpora enrichment using retrieval with claim extractionDissertação