ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica

Xavier, Otávio Calaça

ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica

Arquivos

Tese - Otávio Calaça Xavier - 2025.pdf (2.61 MB)

Data

2025-10-22

Autores

Xavier, Otávio Calaça

Editor

Universidade Federal de Goiás

Resumo

This work addresses the challenge of Knowledge Graph Question Answering (KGQA), a field transformed by the rise of Large Language Models (LLMs) but still facing limitations such as the generation of factually inconsistent information (``hallucinations'') and difficulty in performing complex reasoning. The central objective of this research was to develop and validate a neuro-symbolic architecture that overcomes the limitations of contemporary Retrieval-Augmented Generation (RAG) systems, aiming to integrally solve the challenges of (1) retrieving evidence with low precision and recall, (2) loss of structural context in communication with the LLM, and (3) the absence of explicit logical orchestration in the reasoning process. To this end, the ARANDU framework was designed, implemented, and made available as open source, materializing the proposed architecture. The methodology is divided into an offline preparation stage, where hybrid indexes (lexical and semantic) are created and logical rules are mined from the graph, and an online execution pipeline with three phases: I) Hybrid Evidence Retrieval, which extracts a cohesive subgraph by combining lexical, semantic, and graph-based structured retrieval; II) Logical Context Orchestration, which enriches the subgraph with logical rules and weights the most relevant inference paths; and III) Neural Representation and Generation, where a Graph Neural Network (GNN) encodes the subgraph into a vector representation (graph token) that, along with the textual context, conditions a compact LLM to generate the final answer. The empirical validation, conducted on the WebQSP and MetaQA datasets and compared with baselines such as NaiveRAG, GraphRAG, and G-Retriever, showed that ARANDU achieved superior performance in most scenarios, especially in multi-hop reasoning tasks, with significant improvements in ranking quality metrics like nDCG@10 and MRR. The results also confirmed that neural representation via GNN is more effective than textual linearization and that the architecture is computationally efficient. The research concludes that the synergy between optimized retrieval, logical orchestration, and neural representation, as implemented in ARANDU, constitutes a robust and effective solution that increases the fidelity and precision of answers in KGQA systems, thus validating the central hypothesis of this work.

Palavras-chave

Geração aumentada por recuperação (RAG), LLM, Grafos de conhecimento, IA neuro-simbólica, Resposta a perguntas, Orquestração lógica, Retrieval-augmented generation (RAG), Knowledge graphs, Neuro-symbolic AI, Question answering (QA), Logical orchestration

Citação

XAVIER, O. C. ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica. 2025. 192 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.

URI

https://repositorio.bc.ufg.br/tede/handle/tede/14968

Coleções

Doutorado em Ciência da Computação

Página do item completo

ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções