Alinhamento de LLMs via aprendizado por reforço avaliação de métodos de preferência humana

2026-05-062026-05-062025-12-09NOVAIS, Artur Matos Andrade. Alinhamento de LLMs via aprendizado por reforço avaliação de métodos de preferência humana. 2025. 100 f. Trabalho de Conclusão de Curso (Bacharelado em Inteligência Artificial) - Instituto de Informática, Universidade Federal de Goiás, Goiânia, 2025.https://repositorio.bc.ufg.br//handle/ri/30277This Course Completion Report aims to bring together the results of my journey to become an expert in LLM Alignment. An illustration and its narrative describe the work periods. The Appendices contain the Delivery Acceptance Terms and the results obtained during each work period.porAcesso Abertohttp://creativecommons.org/licenses/by-nc-nd/4.0/Inteligência artificialModelos de linguagemAprendizado por reforçoArtificial intelligenceLanguage modelsReinforcement learningAlinhamento de LLMs via aprendizado por reforço avaliação de métodos de preferência humanaTrabalho de conclusão de curso de graduação (TCCG)