Bayesian reward models for LLM alignment

Published in Workshop on Secure and Trustworthy Large Language Models, 2024

Recommended citation: Yang AX, Robeyns M, Coste T, Wang J, Bou-Ammar H, and Aitchison L. (2024). "Bayesian reward models for LLM alignment." Workshop on Secure and Trustworthy Large Language Models.