ShuffleGate: An Efficient and Self-Polarizing Feature Selection Method for Large-Scale Deep Models in Industry

Abstract

Deep models in industrial applications rely on thousands of features foraccurate predictions, such as deep recommendation systems. While new featuresare introduced to capture evolving user behavior, outdated or redundantfeatures often remain, significantly increasing storage and computationalcosts. To address this issue, feature selection methods are widely adopted toidentify and remove less important features. However, existing approaches facetwo major challenges: (1) they often require complex hyperparameter (Hp)tuning, making them difficult to employ in practice, and (2) they fail toproduce well-separated feature importance scores, which complicatesstraightforward feature removal. Moreover, the impact of removing unimportantfeatures can only be evaluated through retraining the model, a time-consumingand resource-intensive process that severely hinders efficient featureselection. To solve these challenges, we propose a novel feature selection approach,ShuffleGate. In particular, it shuffles all feature values across instancessimultaneously and uses a gating mechanism that allows the model to dynamicallylearn the weights for combining the original and shuffled inputs. Notably, itcan generate well-separated feature importance scores and estimate theperformance without retraining the model, while introducing only a single Hp.Experiments on four public datasets show that our approach outperformsstate-of-the-art methods in feature selection for model retraining. Moreover,it has been successfully integrated into the daily iteration of Bilibili'ssearch models across various scenarios, where it significantly reduces featureset size (up to 60%+) and computational resource usage (up to 20%+), whilemaintaining comparable performance.