Abordagem de seleção de características baseada em AUC com estimativa de probabilidade combinada a técnica de suavização de La Place

Nenhuma Miniatura disponível

Data

2023-09-28

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

The high dimensionality of many datasets has led to the need for dimensionality reduction algorithms that increase performance, reduce computational effort and simplify data processing in applications focused on machine learning or pattern recognition. Due to the need and importance of reduced data, this paper proposes an investigation of feature selection methods, focusing on methods that use AUC (Area Under the ROC curve). Trends in the use of feature selection methods in general and for methods using AUC as an estimator, applied to microarray data, were evaluated. A new feature selection algorithm, the AUC-based feature selection method with probability estimation and the La PLace smoothing method (AUC-EPS), was then developed. The proposed method calculates the AUC considering all possible values of each feature associated with estimation probability and the LaPlace smoothing method. Experiments were conducted to compare the proposed technique with the FAST (Feature Assessment by Sliding Thresholds) and ARCO (AUC and Rank Correlation coefficient Optimization) algorithms. Eight datasets related to gene expression in microarrays were used, all of which were used for the cross-validation experiment and four for the bootstrap experiment. The results showed that the proposed method helped improve the performance of some classifiers and in most cases with a completely different set of features than the other techniques, with some of these features identified by AUC-EPS being critical for disease identification. The work concluded that the proposed method, called AUC-EPS, selects features different from the algorithms FAST and ARCO that help to improve the performance of some classifiers and identify features that are crucial for discriminating cancer.

Descrição

Citação

RIBEIRO, G. A. S. Abordagem de seleção de características baseada em AUC com estimativa de probabilidade combinada a técnica de suavização de La Place. 2023. 113 f. Tese (Doutorado em Ciência da Computação) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2023.