Abstract
Feature selection aims to preprocess the target dataset, find an optimal andmost streamlined feature subset, and enhance the downstream machine learningtask. Among filter, wrapper, and embedded-based approaches, the reinforcementlearning (RL)-based subspace exploration strategy provides a novel objectiveoptimization-directed perspective and promising performance. Nevertheless, evenwith improved performance, current reinforcement learning approaches facechallenges similar to conventional methods when dealing with complex datasets.These challenges stem from the inefficient paradigm of using one agent perfeature and the inherent complexities present in the datasets. This observationmotivates us to investigate and address the above issue and propose a novelapproach, namely HRLFS. Our methodology initially employs a Large LanguageModel (LLM)-based hybrid state extractor to capture each feature's mathematicaland semantic characteristics. Based on this information, features areclustered, facilitating the construction of hierarchical agents for eachcluster and sub-cluster. Extensive experiments demonstrate the efficiency,scalability, and robustness of our approach. Compared to contemporary or theone-feature-one-agent RL-based approaches, HRLFS improves the downstream MLperformance with iterative feature subspace exploration while acceleratingtotal run time by reducing the number of agents involved.