Cross-cancer survival prediction using machine learning models

Resumo

Among the many challenges faced by healthcare systems, cancer remains one of the most urgent. This makes the application of artificial intelligence a critical tool for enhancing early detection and optimizing treatment strategies, especially given the growing volume of patient data being collected. In this study, machine learning models trained on data for a specific type of cancer were employed to predict three-year survival after diagnosis for other cancer types. Two groups were considered: the most frequent cancers and those related to the digestive system. The data were extracted from the Hospital Based Cancer Registries of São Paulo, covering 2000 to 2019, with a consistent selection protocol across all cancer types to enable cross-prediction. XGBoost and LightGBM algorithms were used, choosing the best-performing model for predictions across different topographies. Using a combined dataset of oral cavity, esophageal, and stomach cancers, the model achieved a balanced accuracy of 80.18%, compared with 79.92% for the stomach-specific model. Statistical testing showed no significant difference between these values, suggesting comparable predictive performance. These results illustrate the potential of cross-prediction, especially for rare cancer types where data scarcity represents a significant challenge.

Descrição

Citação

CARDOSO, Lucas Buk et al. Cross-cancer survival prediction using machine learning models. Scientific Reports, London, v. 16, n. 1, e9623, 2026. DOI: 10.1038/s41598-025-34133-w. Disponível em: https://www.nature.com/articles/s41598-025-34133-w. Acesso em: 5 maio 2026.