Abstract

Monocular egocentric 3D human motion capture remains a significant challenge, particularly under conditions of low lighting and fast movements, which are common in head-mounted device applications. Existing methods that rely on RGB cameras often fail under these conditions. To address these limitations, we introduce EventEgo3D++, the first approach that leverages a monocular event camera with a fisheye lens for 3D human motion capture. Event cameras excel in high-speed scenarios and varying illumination due to their high temporal resolution, providing reliable cues for accurate 3D human motion capture. EventEgo3D++ leverages the LNES representation of event streams to enable precise 3D reconstructions. We have also developed a mobile head-mounted device (HMD) prototype equipped with an event camera, capturing a comprehensive dataset that includes real event observations from both controlled studio environments and in-the-wild settings, in addition to a synthetic dataset. Additionally, to provide a more holistic dataset, we include allocentric RGB streams that offer different perspectives of the HMD wearer, along with their corresponding SMPL body model. Our experiments demonstrate that EventEgo3D++ achieves superior 3D accuracy and robustness compared to existing solutions, even in challenging conditions. Moreover, our method supports real-time 3D pose updates at a rate of 140Hz. This work is an extension of the EventEgo3D approach (CVPR 2024) and further advances the state of the art in egocentric 3D human motion capture.

Datasets

EE3D-S

EE3D-S is generated using Blender, containing egocentric RGB frames, event streams, 3D joint annotations, SMPL body model data, and segmentation masks, rendered at 480 fps.

EE3D-R

EE3D-R is recorded using approximately 30 cameras along with our HMD in a studio with uniform lighting and is annotated with 3D joint positions, SMPL body model data, and segmentation masks.

EE3D-W

EE3D-W is recorded using up to 6 RGB cameras along with our HMD across three different environments: indoor settings, outdoor grass bed, and outdoor concrete floor.

3D Joint Annotations

Annotation with SMPL

Egocentric Segmentation Masks

Method

Method

Results

Our method excels in predicting the 3D pose even in challenging scenarios, including fast motion and low light conditions, where traditional RGB-based methods often struggle.

BibTeX

@misc{millerdurai2025eventego3d3dhumanmotion,
      title={EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera}, 
      author={Christen Millerdurai and Hiroyasu Akada and Jian Wang and Diogo Luvizon and Alain Pagani and Didier Stricker and Christian Theobalt and Vladislav Golyanik},
      year={2025},
      eprint={2502.07869},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.07869}, 
}