Abstract
Recent advancements in large language models (LLMs) have demonstratedremarkable reasoning capabilities through long chain-of-thought (CoT)reasoning. The R1 distillation scheme has emerged as a promising approach fortraining cost-effective models with enhanced reasoning abilities. However, theunderlying mechanisms driving its effectiveness remain unclear. This studyexamines the universality of distillation data and identifies key componentsthat enable the efficient transfer of long-chain reasoning capabilities in LLMdistillation. Our findings reveal that the effectiveness of long CoT reasoningdistillation from teacher models like Qwen-QwQ degrades significantly onnonhomologous models, challenging the assumed universality of currentdistillation methods. To gain deeper insights into the structure and patternsof long CoT reasoning, we propose DLCoT (Deconstructing Long Chain-of-Thought),a distillation data enhancement framework. DLCoT consists of three key steps:(1) data segmentation to decompose complex long CoT structures, (2)simplification by eliminating unsolvable and redundant solutions, and (3)optimization of intermediate error states. Our approach significantly improvesmodel performance and token efficiency, facilitating the development ofhigh-performance LLMs.