Random Is Hard to Beat: Active Selection in Online DPO with Modern LLMs

Publication
I Can’t Believe It’s Not Better Workshop @ ICLR 2026