Hi, I’m Yanjun Chen.

PhD in RLHF & Embodied AI. | INTJ.
Builder of thinking agents.

Let’s explore minds that learn.

About Me

I'm a PhD student based in Hong Kong 🇭🇰, originally from China 🇨🇳. As an INTJ thinker and lifelong learner, I explore the frontiers of Artificial Intelligence with a special focus on RLHF (Reinforcement Learning with Human Feedback) and Embodied AI 🤖. My mind is always seeking structure, clarity, and elegant solutions.

📍 Location

Based in Hong Kong

🎯 MBTI

INTJ – The Architect

🎓 Education

PhD @ HK PolyU

💻 Programming

Python 🐍
C/C++ ⚙️

🌐 Languages

Chinese 🇨🇳
English 🇬🇧
Japanese 🇯🇵

🎨 Interests

Table Tennis 🏓Video Games 🎮KTV 🎤Science & Tech 📖

Research

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

We uncover a paradox where moderately accurate reward models outperform stronger ones in RLHF training. This challenges the assumption that better reward models yield better langua...

RLHFAlignmentReward Models

PDF Code arXiv

EMNLP 2024Cited 1×

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

We explore integrating CoT into 3D vision-language alignment, showing significant gains through structured reasoning with our 3D-CoT benchmark.

MultimodalChain-of-Thought3D Reasoning

PDF arXiv

arXiv 2025Cited 0×

Corrected Soft Actor Critic for Continuous Control

We improve SAC by correcting action sampling bias introduced by tanh, achieving better convergence and performance on standard benchmarks.

Reinforcement LearningSACControl

PDF arXiv

arXiv 2024Cited 0×

Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning—But BLEU Turns a Blind Eye

We show instruction-tuned LLMs excel in docMT without fine-tuning. BLEU fails to capture improvements, and GPT-4 proves to be a better evaluator.

LLMsDocument MTEvaluation

PDF Code arXiv

arXiv 2024Cited 2×

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

We dissect how granularity, supervision format, and teacher models affect CoT distillation into small language models across 7 datasets.

LLMsCoT DistillationModel Compression

PDF Code arXiv

arXiv 2025Cited 4×

Breaking the Pre-Planning Barrier: Real-Time Adaptive Coordination of Mission and Charging UAVs Using Graph RL

We introduce HGAM, a novel heterogeneous graph-based multi-agent RL model that enables real-time UAV coordination without pre-planned paths.

Multi-agent RLGraph NetworksUAVs

PDF arXiv

arXiv 2025Cited 0×

📝 Blog

View All →

Test1

2025-02-28

LLMsCoT DistillationModel Compression

How granularity, supervision format, and teacher models affect CoT distillation into smaller LMs across 7 benchmarks.

Test3

2024-03-12

RLHFPhilosophyAI Ethics

Reinforcement Learning from Human Feedback isn’t just a technical optimization, but a philosophical imperative in agent design.

Test2

2024-01-25

Embodied AINeuroscienceRL

Why grounding intelligence in a physical world is essential for the next generation of agents.

Contact

I'm always open to collaboration, interesting conversations, or just sharing ideas over coffee ☕. Whether it's about reinforcement learning, embodied AI, or something entirely unexpected — feel free to reach out!

Email:yan-jun.chen@connect.polyu.hk

WeChat:xzqm13143609845