Aplicação de CNN e LLM na Localização de Defeitos de Software

Nenhuma Miniatura disponível

Data

2024-10-16

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

The increase in the quantity or complexity of computational systems has led to a growth in the occurrence of software defects. The industry invests significant amounts in code debugging, and a considerable portion of the cost is associated with the task of locating the element responsible for the defect. Automated techniques for fault localization have been widely explored, with recent advances driven by the use of deep learning models that combine different types of information about defective source code. However, the accuracy of these techniques still has room for improvement, suggesting open challenges in the field. This work aims to formalize and investigate the most impactful aspects of fault localization techniques, proposing a framework for characterizing approaches to the problem and two solution methodologies: a) based on convolutional neural networks (CNNs) and b) based on large language models (LLMs). From experimentation involving public datasets in Java and Python, it was demonstrated that CNNs are comparable to traditional methods but were found to be inferior to other methods in the literature. The LLM-based approach, on the other hand, greatly outperformed heuristics like Ochiai and Tarantula and proved competitive with more recent literature. An experiment in a scenario free from the data leakage problem showed that LLM-based approaches can be improved by combining them with the Ochiai heuristic.

Descrição

Citação

Basílio Neto, Altino Dantas. Aplicação de CNN e LLM na Localização de Defeitos de Software. Goiânia, 2024. 178 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2024.