Notes

Research notes and essays.

Writing on RLHF, large language models, embodied AI, and adjacent questions that do not fit neatly inside a paper.

2025-02-28

Reasoning Distillation Is Not Compression Alone

Distilling reasoning into smaller language models is not just a matter of shrinking parameters. It is a question of what kind of intermediate structure we want the student to internalize.

LLMsCoT DistillationReasoning

2024-03-12

RLHF Should Be Treated as Feedback Modeling

RLHF is often described as reward optimization, but that framing is incomplete. In practice, the real challenge is modeling the structure and limits of feedback.

RLHFAlignmentReward Modeling

2024-01-25

Embodied AI Needs Closed-Loop Learning

Embodied AI becomes genuinely interesting when perception, action, and feedback are part of the same loop. Static world modeling alone is not enough.

Embodied AIAgentsReinforcement Learning