emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography

  • 2025-03-20 15:51:46
  • Viswanath Sivakumar, Jeffrey Seely, Alan Du, Sean R Bittner, Adam Berenzweig, Anuoluwapo Bolarinwa, Alexandre Gramfort, Michael I Mandel
  • 0

Abstract

Surface electromyography (sEMG) non-invasively measures signals generated bymuscle activity with sufficient sensitivity to detect individual spinal neuronsand richness to identify dozens of gestures and their nuances. Wearablewrist-based sEMG sensors have the potential to offer low friction, subtle,information rich, always available human-computer inputs. To this end, weintroduce emg2qwerty, a large-scale dataset of non-invasive electromyographicsignals recorded at the wrists while touch typing on a QWERTY keyboard,together with ground-truth annotations and reproducible baselines. With 1,135sessions spanning 108 users and 346 hours of recording, this is the largestsuch public dataset to date. These data demonstrate non-trivial, but welldefined hierarchical relationships both in terms of the generative process,from neurons to muscles and muscle combinations, as well as in terms of domainshift across users and user sessions. Applying standard modeling techniquesfrom the closely related field of Automatic Speech Recognition (ASR), we showstrong baseline performance on predicting key-presses using sEMG signals alone.We believe the richness of this task and dataset will facilitate progress inseveral problems of interest to both the machine learning and neuroscientificcommunities. Dataset and code can be accessed athttps://github.com/facebookresearch/emg2qwerty.