Avaliação da sobreamostragem de dados de voz na classificação automática da doença de Parkinson

Nenhuma Miniatura disponível

Data

2024-12-19

Título da Revista

ISSN da Revista

Título de Volume

Editor

Universidade Federal de Goiás

Resumo

This study investigates a possible bias in oversampling via data windowing of vocal signals. Previous studies indicate that there is a bias for gait data when the data is treated independently, in addition there are statistical studies that show that data from the same individual carry similar information. An approach based on three databases containing vocal signals was used, two of which were unbalanced and one balanced. The K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Naive Bayes and Decision Tree (DT) algorithms were applied, with pre-processing using StandardScaler and PCA behavior analysis. Cross validation was done with k-fold Cross Validation, with k=5, in all 3 bases, adapted for scenarios with and without bias in the training data. Models evaluated without considering bias showed inflated performances, while the rigorous approach showed more modest results. It is concluded that samples from the same individual in training and testing can inflate the performance of models, and it is crucial to apply oversampling correctly to develop reliable models for diagnosing PD.

Descrição

Palavras-chave

Doença de Parkinson, Aprendizado de máquina, Diagnóstico, Parkinson’s disease, Machine learning, Diagnosis

Citação

SILVA, Matheus Isac da. Avaliação da sobreamostragem de dados de voz na classificação automática da doença de Parkinson. 2024. 20 f. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Computação) – Escola de Engenharia Elétrica, Mecânica e de Computação, Universidade Federal de Goiás, Goiânia, 2024.