IEEE Transactions on Robotics (T-RO) 2026

Toward Deep Representation Learning for Event-enhanced Visual Autonomous Perception:
the eAP Dataset

Jinghang Li^*,1, Shichao Li^*,2, Qing Lian², Peiliang Li², Xiaozhi Chen², Yi Zhou^†,1

¹ Neuromorphic Automation and Intelligence Lab (NAIL), Hunan University

² Zhuoyu Technology

^* Equal contribution ^† Corresponding author: Yi Zhou

Hunan University

NAIL Lab

Zhuoyu Technology

Paper PDF GitHub Dataset

Abstract

Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination. While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP (event-enhanced Autonomous Perception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP, we demonstrate the first successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometry-aware representation learning framework leads to the best event-based object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.

Data Modalities

Multi-sensor Synchronized Capture

All sensors are hardware-synchronized at sub-microsecond precision using IEEE PTP and a 10 Hz trigger signal.

RGB Camera

FLIR Blackfly S at 1920×1200, 10 Hz. Auto-exposure with exposure time priority (1–20 ms) for minimal motion blur.

Event Camera

Prophesee EVK4 at 1280×720. Microsecond temporal resolution with high dynamic range, ideal for challenging illumination.

LiDAR

3× Livox TELE-15 (320 m range) + 1× Livox Mid-360 for dense point clouds used in 3D annotation.

GNSS-IMU

u-blox ZED-F9K with RTK (0.2 m position accuracy). Ego-pose via tightly-coupled GNSS-Visual-Inertial fusion (GVINS).

Sensor configuration for the eAP dataset showing the vehicle with mounted cameras and LiDAR sensors

Sensor configuration: the event camera and RGB camera are rigidly mounted with a narrow 3 cm baseline.

Dataset Details

Comprehensive Driving Coverage

Diverse driving scenarios spanning highways, urban roads, and low-light conditions across different times of day and weather.

Region	Distance	Illumination	Sequences (Train/Test)	Duration (Train/Test)
Highways	178.44 km	Sunny	13 / 3	65 / 15 min
		Cloudy	10 / 1	50 / 5 min
		Twilight	7 / 2	35 / 10 min
Urban	52.61 km	Sunny	5 / 1	25 / 5 min
		Cloudy	4 / 1	20 / 5 min
		Twilight	1 / 1	5 / 5 min
		Night	5 / 2	25 / 10 min
Low-light	5.01 km	Night	1 / 1	5 / 5 min
Total	236.06 km	—	58	290 min

Train: 46 sequences (138k frames) · Test: 12 sequences (36k frames) · Split at sequence level for fair evaluation.

Annotations

Rich, Multi-dimensional Labels

Pre-labeled via BEVFusion with 500k in-house frames, tracked by 3D Kalman Filter, then manually verified and corrected by human annotators.

3D Bounding Boxes

7-DoF cuboid annotations (x, y, z, l, w, h, yaw) in ego-vehicle coordinate system with front-camera projection.

Object Tracking

Consistent tracking IDs across frames with 11-dimensional state vectors (location, orientation, size, velocity, angular velocity).

Velocity & TTC

Per-object ego-relative speed vectors and time-to-collision (TTC) ground truth: τ = min(Z) / v_rel.

Calibration & Synchronization

Full intrinsic/extrinsic calibration via Kalibr and Calib-Anything. Sub-microsecond temporal alignment. Narrow-baseline (<5 px disparity) RGB-event mapping.

Supported Object Classes

Car

Bus

Truck

SUV

Pedestrian

Motorcycle

Bicycle

Tricycle

Future Annotation Extensions

Dense TTC maps

Optical flow

Depth maps

Segmentation masks

3D cuboid annotation examples on RGB and event camera views across different driving scenarios

3D cuboid annotations projected on RGB and event views across diverse driving scenarios and illumination conditions.

BEV visualization of LiDAR point cloud with annotated 3D bounding boxes and velocity curves

BEV of LiDAR point cloud with 3D boxes and velocity curves of object trajectories.

Projected point cloud on RGB image showing LiDAR-camera calibration quality

Projected LiDAR point cloud on RGB image demonstrating calibration quality.

Showcase

Representative Release Videos

The examples below are 10-second previews rendered directly from the released assets. They are grouped by road type.

Loading showcase videos...

Downloads

Per-Asset Release Bundles

Each asset is distributed as a standalone <asset_id>.zip. The main table covers all assets, and the HDR table below groups all HDR assets in one place.

Loading download catalog...

Team

Authors

Jinghang Li^*

Ph.D. Student

Hunan University

Shichao Li^*

Senior Research Engineer

ByteDance

Qing Lian

Researcher

Zhuoyu Technology

Peiliang Li

Lead, E2E Self-Driving & Next-Gen Algorithms

Zhuoyu Technology

Xiaozhi Chen

Director of AI Research

Zhuoyu Technology

Yi Zhou^†

Professor

Hunan University

Citation

Cite Our Work

@misc{li2026eap,
  title         = {Toward Deep Representation Learning for Event-Enhanced
                   Visual Autonomous Perception: the eAP Dataset},
  author        = {Li, Jinghang and Li, Shichao and Lian, Qing
                   and Li, Peiliang and Chen, Xiaozhi and Zhou, Yi},
  year          = {2026},
  eprint        = {2603.16303},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2603.16303},
}

Toward Deep Representation Learning for Event-enhanced Visual Autonomous Perception: the eAP Dataset

Dataset Statistics

Multi-sensor Synchronized Capture

RGB Camera

Event Camera

LiDAR

GNSS-IMU

Comprehensive Driving Coverage

Rich, Multi-dimensional Labels

3D Bounding Boxes

Object Tracking

Velocity & TTC

Calibration & Synchronization

Supported Object Classes

Future Annotation Extensions

Representative Release Videos

Per-Asset Release Bundles

Authors

Jinghang Li*

Shichao Li*

Qing Lian

Peiliang Li

Xiaozhi Chen

Yi Zhou†

Cite Our Work

Toward Deep Representation Learning for Event-enhanced Visual Autonomous Perception:
the eAP Dataset

Jinghang Li^*

Shichao Li^*

Yi Zhou^†