Random Is Hard to Beat: Active Selection in Online DPO with Modern LLMs

Publication
ICLRW