Test1
2025-02-28
π§© Background
Chain-of-Thought (CoT) prompting enables powerful multi-step reasoning in LLMs. But transferring this ability into small language models (SLMs) remains a major challenge.
π§ͺ Experiment Design
We study three axes:
- Granularity of intermediate steps
- Supervision format: explanation-only vs answer+reasoning
- Teacher model: GPT-3.5 vs GPT-4
7 datasets are used: GSM8K, SVAMP, DROP, etc.
π Key Findings
- Fine-grained step supervision leads to more generalizable reasoning
- Answer-only distillation fails to transfer reasoning skills
- GPT-4 outputs yield more structured logic chains than GPT-3.5
π‘ Takeaway
Distillation is not copying β itβs translation.
βThe way we teach reasoning determines what reasoning emerges.β