Notes
Research notes and essays.
Writing on RLHF, large language models, embodied AI, and adjacent questions that do not fit neatly inside a paper.
2025-02-28
Reasoning Distillation Is Not Compression Alone
Distilling reasoning into smaller language models is not just a matter of shrinking parameters. It is a question of what kind of intermediate structure we want the student to internalize.
LLMsCoT DistillationReasoning
2024-03-12
RLHF Should Be Treated as Feedback Modeling
RLHF is often described as reward optimization, but that framing is incomplete. In practice, the real challenge is modeling the structure and limits of feedback.
RLHFAlignmentReward Modeling
2024-01-25
Embodied AI Needs Closed-Loop Learning
Embodied AI becomes genuinely interesting when perception, action, and feedback are part of the same loop. Static world modeling alone is not enough.
Embodied AIAgentsReinforcement Learning