ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
This work addresses the challenge of Knowledge Graph Question Answering (KGQA), a field transformed by the rise of Large Language Models (LLMs) but still facing limitations such as the generation of factually inconsistent information (``hallucinations'') and difficulty in performing complex reasoning. The central objective of this research was to develop and validate a neuro-symbolic architecture that overcomes the limitations of contemporary Retrieval-Augmented Generation (RAG) systems, aiming to integrally solve the challenges of (1) retrieving evidence with low precision and recall, (2) loss of structural context in communication with the LLM, and (3) the absence of explicit logical orchestration in the reasoning process. To this end, the ARANDU framework was designed, implemented, and made available as open source, materializing the proposed architecture. The methodology is divided into an offline preparation stage, where hybrid indexes (lexical and semantic) are created and logical rules are mined from the graph, and an online execution pipeline with three phases: I) Hybrid Evidence Retrieval, which extracts a cohesive subgraph by combining lexical, semantic, and graph-based structured retrieval; II) Logical Context Orchestration, which enriches the subgraph with logical rules and weights the most relevant inference paths; and III) Neural Representation and Generation, where a Graph Neural Network (GNN) encodes the subgraph into a vector representation (graph token) that, along with the textual context, conditions a compact LLM to generate the final answer. The empirical validation, conducted on the WebQSP and MetaQA datasets and compared with baselines such as NaiveRAG, GraphRAG, and G-Retriever, showed that ARANDU achieved superior performance in most scenarios, especially in multi-hop reasoning tasks, with significant improvements in ranking quality metrics like nDCG@10 and MRR. The results also confirmed that neural representation via GNN is more effective than textual linearization and that the architecture is computationally efficient. The research concludes that the synergy between optimized retrieval, logical orchestration, and neural representation, as implemented in ARANDU, constitutes a robust and effective solution that increases the fidelity and precision of answers in KGQA systems, thus validating the central hypothesis of this work.
Descrição
Citação
XAVIER, O. C. ARANDU: framework para geração aumentada por recuperação em grafos de conhecimento com fundamentação neuro-simbólica. 2025. 192 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.