Jinghang Li*,1, Shichao Li*,2, Qing Lian2, Peiliang Li2, Xiaozhi Chen2, Yi Zhou†,1
1 Neuromorphic Automation and Intelligence Lab (NAIL), Hunan University
2 Zhuoyu Technology
* Equal contribution † Corresponding author: Yi Zhou
Abstract
Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination. While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP (event-enhanced Autonomous Perception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP, we demonstrate the first successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometry-aware representation learning framework leads to the best event-based object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.
At a Glance
Data Modalities
All sensors are hardware-synchronized at sub-microsecond precision using IEEE PTP and a 10 Hz trigger signal.
FLIR Blackfly S at 1920×1200, 10 Hz. Auto-exposure with exposure time priority (1–20 ms) for minimal motion blur.
Prophesee EVK4 at 1280×720. Microsecond temporal resolution with high dynamic range, ideal for challenging illumination.
3× Livox TELE-15 (320 m range) + 1× Livox Mid-360 for dense point clouds used in 3D annotation.
u-blox ZED-F9K with RTK (0.2 m position accuracy). Ego-pose via tightly-coupled GNSS-Visual-Inertial fusion (GVINS).
Sensor configuration: the event camera and RGB camera are rigidly mounted with a narrow 3 cm baseline.
Dataset Details
Diverse driving scenarios spanning highways, urban roads, and low-light conditions across different times of day and weather.
| Region | Distance | Illumination | Sequences (Train/Test) | Duration (Train/Test) |
|---|---|---|---|---|
| Highways | 178.44 km | Sunny | 13 / 3 | 65 / 15 min |
| Cloudy | 10 / 1 | 50 / 5 min | ||
| Twilight | 7 / 2 | 35 / 10 min | ||
| Urban | 52.61 km | Sunny | 5 / 1 | 25 / 5 min |
| Cloudy | 4 / 1 | 20 / 5 min | ||
| Twilight | 1 / 1 | 5 / 5 min | ||
| Night | 5 / 2 | 25 / 10 min | ||
| Low-light | 5.01 km | Night | 1 / 1 | 5 / 5 min |
| Total | 236.06 km | — | 58 | 290 min |
Annotations
Pre-labeled via BEVFusion with 500k in-house frames, tracked by 3D Kalman Filter, then manually verified and corrected by human annotators.
7-DoF cuboid annotations (x, y, z, l, w, h, yaw) in ego-vehicle coordinate system with front-camera projection.
Consistent tracking IDs across frames with 11-dimensional state vectors (location, orientation, size, velocity, angular velocity).
Per-object ego-relative speed vectors and time-to-collision (TTC) ground truth: τ = min(Z) / vrel.
Full intrinsic/extrinsic calibration via Kalibr and Calib-Anything. Sub-microsecond temporal alignment. Narrow-baseline (<5 px disparity) RGB-event mapping.
3D cuboid annotations projected on RGB and event views across diverse driving scenarios and illumination conditions.
BEV of LiDAR point cloud with 3D boxes and velocity curves of object trajectories.
Projected LiDAR point cloud on RGB image demonstrating calibration quality.
Showcase
The examples below are 10-second previews rendered directly from the released assets. They are grouped by road type.
Downloads
Each asset is distributed as a standalone <asset_id>.zip. The main table covers all assets, and the HDR table below groups all HDR assets in one place.
Citation
@misc{li2026eap,
title = {Toward Deep Representation Learning for Event-Enhanced
Visual Autonomous Perception: the eAP Dataset},
author = {Li, Jinghang and Li, Shichao and Lian, Qing
and Li, Peiliang and Chen, Xiaozhi and Zhou, Yi},
year = {2026},
eprint = {2603.16303},
archivePrefix = {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2603.16303},
}