Avaliação da sobreamostragem de dados de voz na classificação automática da doença de Parkinson
Nenhuma Miniatura disponível
Data
2024-12-19
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
This study investigates a possible bias in oversampling via data windowing of vocal signals. Previous studies indicate that there is a bias for gait data when the data is treated independently, in addition there are statistical studies that show that data from the same individual carry similar information. An approach based on three databases containing vocal signals was used, two of which were unbalanced and one balanced. The K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Naive Bayes and Decision Tree (DT) algorithms were applied, with pre-processing using StandardScaler and PCA behavior analysis. Cross validation was done with k-fold Cross Validation, with k=5, in all 3 bases, adapted for scenarios with and without bias in the training data. Models evaluated without considering bias showed inflated performances, while the rigorous approach showed more modest results. It is
concluded that samples from the same individual in training and testing can inflate the performance of models, and it is crucial to apply oversampling correctly to develop reliable models for diagnosing PD.
Descrição
Palavras-chave
Doença de Parkinson, Aprendizado de máquina, Diagnóstico, Parkinson’s disease, Machine learning, Diagnosis
Citação
SILVA, Matheus Isac da. Avaliação da sobreamostragem de dados de voz na classificação automática da doença de Parkinson. 2024. 20 f. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Computação) – Escola de Engenharia Elétrica, Mecânica e de Computação, Universidade Federal de Goiás, Goiânia, 2024.