Debate-Driven Multi-Agent LLMs for Phishing Email Detection

Abstract

Phishing attacks remain a critical cybersecurity threat. Attackers constantlyrefine their methods, making phishing emails harder to detect. Traditionaldetection methods, including rule-based systems and supervised machine learningmodels, either rely on predefined patterns like blacklists, which can bebypassed with slight modifications, or require large datasets for training andstill can generate false positives and false negatives. In this work, wepropose a multi-agent large language model (LLM) prompting technique thatsimulates debates among agents to detect whether the content presented on anemail is phishing. Our approach uses two LLM agents to present arguments for oragainst the classification task, with a judge agent adjudicating the finalverdict based on the quality of reasoning provided. This debate mechanismenables the models to critically analyze contextual cue and deceptive patternsin text, which leads to improved classification accuracy. The proposedframework is evaluated on multiple phishing email datasets and demonstrate thatmixed-agent configurations consistently outperform homogeneous configurations.Results also show that the debate structure itself is sufficient to yieldaccurate decisions without extra prompting strategies.