Blog

Intuition Explained: Behavioral Supervisor Tuning

Posted on March 5, 2025

Conferences confine a paper to be at most X pages in length, where X is usually 7-10 pages. The need to communicate most of the important (theoretical and experimental) parts of my work in such a format means making sacrifices - this often ends in me removing text that aims... [Read More]

Tags: tech, machine learning, reinforcement learning

Thoughts on DPO and Offline RL

Posted on June 22, 2024

Direct Preference Optimization is all the rage now in LLMs, and rightly so! The derivation is neat (and very familiar to those experienced with reinforcement learning) and allows direct, preference-based finetuning of regression-trained LLMs without having to learn a reward model. [Read More]

Tags: tech, machine learning, reinforcement learning

Some Interesting Offline RL Methods (Early 2024)

Posted on February 22, 2024

Intro [Read More]

Tags: tech, machine learning, reinforcement learning

An Introduction to Preference-Based RL

Posted on February 22, 2024

Intro [Read More]

Tags: tech, machine learning, reinforcement learning

An Overview of Model-Based Offline RL Methods

Posted on February 22, 2024

Intro [Read More]

Tags: tech, machine learning, reinforcement learning

An Intro to Offline Reinforcement Learning

Posted on December 15, 2023

What is Reinforcement Learning? [Read More]

Tags: tech, machine learning, reinforcement learning