Multi-objective Good Arm Identification with Bandit Feedback

Abstract

We consider a good arm identification problem in a stochastic bandit settingwith multi-objectives, where each arm $i\in[K]$ is associated with $M$distributions $\mathcal{D}_i^{(1)}, \ldots, \mathcal{D}_i^{(M)}$. For eachround $t$, the player/algorithm pulls one arm $i_t$ and receives a vectorfeedback, where each component $m$ is sampled according to$\mathcal{D}_i^{(m)}$. The target is twofold, one is finding one arm whosemeans are larger than the predefined thresholds $\xi_1,\ldots,\xi_M$ with aconfidence bound $\delta$ and an accuracy rate $\epsilon$ with a bounded samplecomplexity, the other is output $\bot$ to indicate no such arm exists. Wepropose an algorithm with a sample complexity bound. When $M=1$ and $\epsilon =0$, our bound is the same as the one given in the previous work when and novelbounds for $M > 1$. The proposed algorithm attains better numerical performancethan other baselines in the experiments on synthetic and real datasets.