Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content

Abstract

This paper presents the Deceptive Humor Dataset (DHD), a novel resource forstudying humor derived from fabricated claims and misinformation. In an era oframpant misinformation, understanding how humor intertwines with deception isessential. DHD consists of humor-infused comments generated from falsenarratives, incorporating fabricated claims and manipulated information usingthe ChatGPT-4o model. Each instance is labeled with a Satire Level, rangingfrom 1 for subtle satire to 3 for high-level satire and classified into fivedistinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, andAbsurdity. The dataset spans multiple languages including English, Telugu,Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En,Ta-En), making it a valuable multilingual benchmark. By introducing DHD, weestablish a structured foundation for analyzing humor in deceptive contexts,paving the way for a new research direction that explores how humor not onlyinteracts with misinformation but also influences its perception and spread. Weestablish strong baselines for the proposed dataset, providing a foundation forfuture research to benchmark and advance deceptive humor detection models.