Potential use of data-driven models to estimate and predict soybean yields at national scale in Brazil

dc.creatorMonteiro, Leonardo Amaral
dc.creatorRamos, Rafael Marconi
dc.creatorBattisti, Rafael
dc.creatorSoares, Johnny Rodrigues
dc.creatorOliveira, Julianne C.
dc.creatorFigueiredo, Gleyce Kelly Dantas Araújo
dc.creatorLamparelli, Rubens Augusto Camargo
dc.creatorNendel, Claas
dc.creatorLana, Marcos Alberto
dc.date.accessioned2025-01-09T19:19:02Z
dc.date.available2025-01-09T19:19:02Z
dc.date.issued2022
dc.description.abstractLarge-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha–1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.
dc.identifier.citationMONTEIRO, Leonardo A. et al. Potential use of data-driven models to estimate and predict soybean yields at national scale in Brazil. International Journal of Plant Production, [s. l.], v. 16, p. 691-703, 2022. DOI: 10.1007/s42106-022-00209-0. Disponível em: https://link.springer.com/article/10.1007/s42106-022-00209-0. Acesso em: 13 nov. 2024.
dc.identifier.doi10.1007/s42106-022-00209-0
dc.identifier.issn1735-6814
dc.identifier.issne- 1735-8043
dc.identifier.urihttps://link.springer.com/article/10.1007/s42106-022-00209-0
dc.language.isoeng
dc.publisher.countryOutros
dc.publisher.departmentEscola de Agronomia - EA (RMG)
dc.rightsAcesso Restrito
dc.subjectLarge-scale analysis
dc.subjectMachine learning approaches
dc.subjectPublic databases
dc.subjectGeospatial and temporal variability
dc.subjectClimatic and soil variables
dc.titlePotential use of data-driven models to estimate and predict soybean yields at national scale in Brazil
dc.typeArtigo

Arquivos

Licença do Pacote

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição: