TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection

  • 2025-03-31 14:06:17
  • Zhiming Ma, Peidong Wang, Minhua Huang, Jingpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, Yuchen Kang
  • 0

Abstract

The detection of telecom fraud faces significant challenges due to the lackof high-quality multimodal training data that integrates audio signals withreasoning-oriented textual analysis. To address this gap, we presentTeleAntiFraud-28k, the first open-source audio-text slow-thinking datasetspecifically designed for automated telecom fraud analysis. Our dataset isconstructed through three strategies: (1) Privacy-preserved text-truth samplegeneration using automatically speech recognition (ASR)-transcribed callrecordings (with anonymized original audio), ensuring real-world consistencythrough text-to-speech (TTS) model regeneration; (2) Semantic enhancement vialarge language model (LLM)-based self-instruction sampling on authentic ASRoutputs to expand scenario coverage; (3) Multi-agent adversarial synthesis thatsimulates emerging fraud tactics through predefined communication scenarios andfraud typologies. The generated dataset contains 28,511 rigorously processedspeech-text pairs, complete with detailed annotations for fraud reasoning. Thedataset is divided into three tasks: scenario classification, fraud detection,fraud type classification. Furthermore, we construct TeleAntiFraud-Bench, astandardized evaluation benchmark comprising proportionally sampled instancesfrom the dataset, to facilitate systematic testing of model performance ontelecom fraud detection tasks. We also contribute a production-optimizedsupervised fine-tuning (SFT) model trained on hybrid real/synthetic data, whileopen-sourcing the data processing framework to enable community-driven datasetexpansion. This work establishes a foundational framework for multimodalanti-fraud research while addressing critical challenges in data privacy andscenario diversity. The project will be released athttps://github.com/JimmyMa99/TeleAntiFraud.