Conferences confine a paper to be at most X pages in length, where X is usually 7-10 pages. The need to communicate most of the important (theoretical and experimental) parts of my work in such a format means making sacrifices - this often ends in me removing text that aims...
[Read More]
Thoughts on DPO and Offline RL
Direct Preference Optimization is all the rage now in LLMs, and rightly so! The derivation is neat (and very familiar to those experienced with reinforcement learning) and allows direct, preference-based finetuning of regression-trained LLMs without having to learn a reward model.
[Read More]
Some Interesting Offline RL Methods (Early 2024)
Intro
[Read More]
An Introduction to Preference-Based RL
Intro
[Read More]
An Overview of Model-Based Offline RL Methods
Intro
[Read More]
An Intro to Offline Reinforcement Learning
What is Reinforcement Learning?
[Read More]