LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates

Abstract

Recent findings reveal that much of the knowledge in a Transformer-basedLarge Language Model (LLM) is encoded in its feed-forward (FFN) layers, whereeach FNN layer can be interpreted as the summation of sub-updates, eachcorresponding to a weighted column vector from the FFN's value parameter matrixthat often encodes human-interpretable concepts. In light of this, wehypothesize that model performance and behaviors can be further enhanced andcontrolled by modulating the contributions of these sub-updates based on theirrelevance to the input or target output style, and propose LLMBRACES, a noveland efficient method that computes relevance scores associated with valuevectors in FFN layers and leverages these scores to dynamically adjust thecontribution of sub-updates. By optimizing sub-update contributions, LLMBRACESrefines the prediction process, leading to more accurate and reliable outputs,much like a 'brace' providing support and stability. Moreover, LLMBRACES can beextended to support conditional control over generation characteristics, suchas sentiment, thereby offering fine-grained steering of LLM outputs. Extensiveexperiments on various LLMs-including Qwen2.5-1.5B, Llama2-7B, andLlama3-8B-demonstrate that LLMBRACES outperforms baseline approaches in bothfine-tuning and zero-shot settings while requiring significantly fewer tunableparameters, up to 75% fewer compared to LoRA. Furthermore, LLMBRACES excels insentiment-controlled generation and toxicity reduction, highlighting itspotential for flexible, controlled text generation across applications.