Effectively Controlling Reasoning Models through Thinking Intervention

Abstract

Reasoning-enhanced large language models (LLMs) explicitly generateintermediate reasoning steps prior to generating final answers, helping themodel excel in complex problem-solving. In this paper, we demonstrate that thisemerging generation framework offers a unique opportunity for more fine-grainedcontrol over model behavior. We propose Thinking Intervention, a novel paradigmdesigned to explicitly guide the internal reasoning processes of LLMs bystrategically inserting or revising specific thinking tokens. We conductcomprehensive evaluations across multiple tasks, including instructionfollowing on IFEval, instruction hierarchy on SEP, and safety alignment onXSTest and SORRY-Bench. Our results demonstrate that Thinking Interventionsignificantly outperforms baseline prompting approaches, achieving up to 6.7%accuracy gains in instruction-following scenarios, 15.4% improvements inreasoning about instruction hierarchies, and a 40.0% increase in refusal ratesfor unsafe prompts using open-source DeepSeek R1 models. Overall, our workopens a promising new research avenue for controlling reasoning LLMs.