π Blog
Thoughts, experiments, and insights across RLHF, LLMs, Embodied AI, and beyond.
Test1
2025-02-28
How granularity, supervision format, and teacher models affect CoT distillation into smaller LMs across 7 benchmarks.
LLMsCoT DistillationModel Compression
Read more β
Test3
2024-03-12
Reinforcement Learning from Human Feedback isnβt just a technical optimization, but a philosophical imperative in agent design.
RLHFPhilosophyAI Ethics
Read more β
Test2
2024-01-25
Why grounding intelligence in a physical world is essential for the next generation of agents.
Embodied AINeuroscienceRL
Read more β