Test1
How granularity, supervision format, and teacher models affect CoT distillation into smaller LMs across 7 benchmarks.
PhD in RLHF & Embodied AI. | INTJ.
Builder of thinking agents.
Letโs explore minds that learn.
I'm a PhD student based in Hong Kong ๐ญ๐ฐ, originally from China ๐จ๐ณ. As an INTJ thinker and lifelong learner, I explore the frontiers of Artificial Intelligence with a special focus on RLHF (Reinforcement Learning with Human Feedback) and Embodied AI ๐ค. My mind is always seeking structure, clarity, and elegant solutions.
Based in Hong Kong
INTJ โ The Architect
PhD @ HK PolyU
How granularity, supervision format, and teacher models affect CoT distillation into smaller LMs across 7 benchmarks.
Reinforcement Learning from Human Feedback isnโt just a technical optimization, but a philosophical imperative in agent design.
Why grounding intelligence in a physical world is essential for the next generation of agents.
I'm always open to collaboration, interesting conversations, or just sharing ideas over coffee โ. Whether it's about reinforcement learning, embodied AI, or something entirely unexpected โ feel free to reach out!