PhD Student

Yanjun Chen

Department of Computing, The Hong Kong Polytechnic University

I am a PhD student at The Hong Kong Polytechnic University working on reinforcement learning with human feedback, reward modeling, reasoning in large language models, and embodied intelligence.

My broader goal is to build learning-based agents that are more reliable, adaptive, and practically useful in complex environments.

Reinforcement learning with human feedbackReward modelingLarge language modelsLLM reasoningEmbodied AI

Citations

108

Since 2021 108

h-index

5

Since 2021 5

i10-index

3

Since 2021 3

Publications

11

Last synced Mar 8, 2026

Current topics: Reinforcement Learning / Reward Modeling / Large Language Model / Embodied AI. Academic profile data is synchronized from Google Scholar every three days.

Portrait of Yanjun Chen

Department of Computing

The Hong Kong Polytechnic University

Hong Kong

About

Biography and research direction.

My research sits at the intersection of language-model alignment, reinforcement learning, and embodied intelligence. I am particularly interested in how agents can learn from feedback, reason more effectively, and behave more reliably in open-ended environments.

Across my work, I care about methods that are both rigorous and useful: clear problem formulation, reproducible experiments, and systems that can transfer beyond narrow benchmark settings.

I welcome academic collaboration, visiting opportunities, and research-oriented industry conversations related to RLHF, agent training, reasoning, and embodied AI.

Research themes

RLHF, reward modeling, reasoning in LLMs, reinforcement learning, and embodied intelligence.

Approach

Clear problem framing, reproducible experiments, and methods that transfer beyond narrow benchmarks.

Collaboration

Particularly interested in work that connects foundational research with practical agent systems and evaluation.

Updates

Recent academic activity.

A compact view of recent publications and the current Google Scholar snapshot. This section is meant to give collaborators, faculty, and industry researchers a fast overview of ongoing activity.

Snapshot

Google Scholar currently lists 11 publications and 108 citations, with an h-index of 5.

Last synchronization: Mar 8, 2026.

View full Scholar profile

Recent publications

2025

Unveiling the key factors for distilling chain-of-thought reasoning

Findings of the Association for Computational Linguistics: ACL 2025, 15094-15119, 2025

Citations

37

2025

Fine-grained and multi-dimensional metrics for document-level machine translation

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of …, 2025

Citations

7

Research

Publications and scholarly profile.

My publications span RLHF, reward modeling, reasoning in LLMs, reinforcement learning, and embodied intelligence. Citation metrics and publication metadata below are synchronized from my public Google Scholar profile.

Open Google Scholar

Total citations

108

Since 2021 108

h-index

5

Since 2021 5

i10-index

3

Since 2021 3

Publications

11

Refreshed Mar 8, 2026

Selected work

Representative papers

arXiv 2025

Reasoning beyond language: A comprehensive survey on latent chain-of-thought reasoning

X Chen, A Zhao, H Xia, X Lu, H Wang, Y Chen, W Zhang, J Wang, W Li, ...

A survey of latent chain-of-thought reasoning that maps out how reasoning can emerge beyond explicit verbalized steps, with implications for evaluation, supervision, and model design.

Citations

39

LLM ReasoningSurveyChain-of-Thought

ACL Findings 2025

Unveiling the key factors for distilling chain-of-thought reasoning

X Chen, Z Sun, G Wenjin, M Zhang, Y Chen, Y Sun, H Su, Y Pan, ...

An empirical study of how supervision format, reasoning granularity, and teacher quality affect the distillation of chain-of-thought reasoning into smaller language models.

Citations

37

LLMsCoT DistillationModel Compression

EMNLP 2024

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Y Chen, D Zhu, Y Sun, X Chen, W Zhang, X Shen

We show that more accurate reward models do not always yield better RLHF outcomes, highlighting a practical paradox between reward-model quality and downstream language-model performance.

Citations

14

RLHFAlignmentReward Models

NAACL 2025

Fine-grained and multi-dimensional metrics for document-level machine translation

Y Sun, D Zhu, Y Chen, E Xiao, X Chen, X Shen

A study of fine-grained evaluation signals for document-level machine translation, with the goal of measuring translation quality beyond coarse aggregate metrics.

Citations

7

Machine TranslationEvaluationDocument MT

arXiv 2024

Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost of Ignoring Distribution Shift

Y Chen, X Zhang, X Wang, Z Xu, X Shen, W Zhang

We revisit Soft Actor-Critic in high-dimensional control and analyze how distribution shift in action sampling can undermine learning stability and final performance.

Citations

5

Reinforcement LearningSACControl

arXiv 2025

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

Y Chen, Y Sun, X Chen, J Wang, X Shen, W Li, W Zhang

A study of how chain-of-thought style reasoning can improve multimodal alignment in 3D vision-language settings through more structured intermediate supervision.

Citations

3

MultimodalChain-of-Thought3D Reasoning

Full list

Full publication list

YearTitleVenueCitations
2026SonicBench: Dissecting the Physical Perception Bottleneck in Large Audio Language Models

Y Sun, Y Chen, X Qiu, G Zhang, H Chen, D Wu, C Li, M Yang, D Zhu, ...

arXiv preprint arXiv:2601.11039, 20260
2025Reasoning beyond language: A comprehensive survey on latent chain-of-thought reasoning

X Chen, A Zhao, H Xia, X Lu, H Wang, Y Chen, W Zhang, J Wang, W Li, ...

arXiv 202539
2025Unveiling the key factors for distilling chain-of-thought reasoning

X Chen, Z Sun, G Wenjin, M Zhang, Y Chen, Y Sun, H Su, Y Pan, ...

ACL Findings 202537
2025Fine-grained and multi-dimensional metrics for document-level machine translation

Y Sun, D Zhu, Y Chen, E Xiao, X Chen, X Shen

NAACL 20257
2025Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

Y Chen, Y Sun, X Chen, J Wang, X Shen, W Li, W Zhang

arXiv 20253
2025LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

Y Sun, Y Geng, P Wei, Y Chen, J Yang, R Chen, W Zhang, X Shen

arXiv preprint arXiv:2508.15418, 20252
2025Breaking the pre-planning barrier: Real-time adaptive coordination of mission and charging UAVs using graph reinforcement learning

Y Hu, Y Sun, Y Chen, X Chen

arXiv e-prints, arXiv: 2501.14488, 20251
2025PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks

Y Liu, D Zhu, Z Al-Khalili, D Cheng, Y Chen, D Klakow, W Zhang, X Shen

Proceedings of the 2025 Conference on Empirical Methods in Natural Language …, 20250
2025MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

X Wang, X Zhang, Y Chen, X Shen, W Zhang

arXiv preprint arXiv:2505.08367, 20250
2024The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

Y Chen, D Zhu, Y Sun, X Chen, W Zhang, X Shen

EMNLP 202414
2024Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost of Ignoring Distribution Shift

Y Chen, X Zhang, X Wang, Z Xu, X Shen, W Zhang

arXiv 20245

Projects

Open source and reproducibility.

Selected repositories and reproducible research artifacts related to my publications. This section is especially useful for readers who want to evaluate implementation quality or explore follow-up work.

ACL Findings 2025

Unveiling the key factors for distilling chain-of-thought reasoning

An empirical study of how supervision format, reasoning granularity, and teacher quality affect the distillation of chain-of-thought reasoning into smaller language models.

LLMsCoT DistillationModel Compression

EMNLP 2024

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

We show that more accurate reward models do not always yield better RLHF outcomes, highlighting a practical paradox between reward-model quality and downstream language-model performance.

RLHFAlignmentReward Models

Teaching

Teaching and mentoring.

Teaching is a core part of academic life. This section will be used to document course support, mentoring activities, and teaching materials as they become available.

Coming soon

Teaching

Course, TA, and instructional experience will be documented here. A fuller teaching record and teaching materials page are planned.

Coming soon

Mentoring

Supervision, informal mentoring, reading groups, and project guidance will be added as this section grows.

Service

Academic and professional service.

A standard academic homepage usually documents how the researcher contributes to the broader community, not only through papers and code, but also through service.

Coming soon

Academic service

Conference reviewing, workshop organization, and community service contributions will be listed here when ready for publication.

Coming soon

Professional activities

Reading groups, research communities, and other forms of scholarly engagement can be added here over time.

Honors

Awards, honors, and milestones.

This section will be used to record distinctions and major milestones in a format that is easy for academic and industry readers to scan.

Coming soon

Awards and honors

Scholarships, distinctions, awards, and recognitions will be collected here as a concise record.

Coming soon

Selected milestones

Notable research milestones, invited activities, or other professional highlights can also be summarized in this section.

CV

Curriculum vitae and materials.

A dedicated CV link is planned as part of this homepage. Until a finalized public version is added, please contact me directly if you need a current CV or supporting materials.

Coming soon

Public CV download

A polished public CV and selected supporting materials will be linked here in a later update.

Request by email

Writing

Selected writing and research notes

View all

Contact

Open to collaboration and research conversations.

I welcome academic collaboration, visiting opportunities, and research-oriented industry discussions related to RLHF, LLM reasoning, reward modeling, and embodied AI. Email is the best first contact.