Hi, Iโ€™m Yanjun Chen.

PhD in RLHF & Embodied AI. | INTJ.
Builder of thinking agents.

Letโ€™s explore minds that learn.

Yanjun Chen 1

About Me

I'm a PhD student based in Hong Kong ๐Ÿ‡ญ๐Ÿ‡ฐ, originally from China ๐Ÿ‡จ๐Ÿ‡ณ. As an INTJ thinker and lifelong learner, I explore the frontiers of Artificial Intelligence with a special focus on RLHF (Reinforcement Learning with Human Feedback) and Embodied AI ๐Ÿค–. My mind is always seeking structure, clarity, and elegant solutions.

๐Ÿ“ Location

Based in Hong Kong

๐ŸŽฏ MBTI

INTJ โ€“ The Architect

๐ŸŽ“ Education

PhD @ HK PolyU

๐Ÿ’ป Programming

  • Python ๐Ÿ
  • C/C++ โš™๏ธ

๐ŸŒ Languages

  • Chinese ๐Ÿ‡จ๐Ÿ‡ณ
  • English ๐Ÿ‡ฌ๐Ÿ‡ง
  • Japanese ๐Ÿ‡ฏ๐Ÿ‡ต

๐ŸŽจ Interests

Table Tennis ๐Ÿ“Video Games ๐ŸŽฎKTV ๐ŸŽคScience & Tech ๐Ÿ“–

Research

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

We uncover a paradox where moderately accurate reward models outperform stronger ones in RLHF training. This challenges the assumption that better reward models yield better langua...
RLHFAlignmentReward Models
EMNLP 2024Cited 1ร—

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

We explore integrating CoT into 3D vision-language alignment, showing significant gains through structured reasoning with our 3D-CoT benchmark.
MultimodalChain-of-Thought3D Reasoning
arXiv 2025Cited 0ร—

Corrected Soft Actor Critic for Continuous Control

We improve SAC by correcting action sampling bias introduced by tanh, achieving better convergence and performance on standard benchmarks.
Reinforcement LearningSACControl
arXiv 2024Cited 0ร—

Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuningโ€”But BLEU Turns a Blind Eye

We show instruction-tuned LLMs excel in docMT without fine-tuning. BLEU fails to capture improvements, and GPT-4 proves to be a better evaluator.
LLMsDocument MTEvaluation
arXiv 2024Cited 2ร—

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

We dissect how granularity, supervision format, and teacher models affect CoT distillation into small language models across 7 datasets.
LLMsCoT DistillationModel Compression
arXiv 2025Cited 4ร—

Breaking the Pre-Planning Barrier: Real-Time Adaptive Coordination of Mission and Charging UAVs Using Graph RL

We introduce HGAM, a novel heterogeneous graph-based multi-agent RL model that enables real-time UAV coordination without pre-planned paths.
Multi-agent RLGraph NetworksUAVs
arXiv 2025Cited 0ร—

๐Ÿ“ Blog

View All โ†’

Contact

I'm always open to collaboration, interesting conversations, or just sharing ideas over coffee โ˜•. Whether it's about reinforcement learning, embodied AI, or something entirely unexpected โ€” feel free to reach out!