Challenges - Rich Media with Generative AI

Challenge Overview

Recent benchmarks have advanced 3D and 4D scene reconstruction, with datasets such as Mip-NeRF 360 focusing on static scenes, and later works like D-NeRF and Neural 3D Video introducing limited dynamics. However, these datasets primarily evaluate reconstruction quality and do not support systematic evaluation of FVV generation and delivery. In parallel, neural video compression benchmarks such as Vimeo-90K study bitrate-fidelity trade-offs in the image domain but lack 3D geometry and camera calibration.

To address these gaps, we introduce the M3VIR-2 dataset, a fully synthetic and richly annotated benchmark extending M3VIR (released by RichMediaGAI at MM2025) for evaluating both generative FVV synthesis and efficient neural delivery. Scenes contain static environments and dynamic objects rendered from six synchronized ego-centric cameras, with ground-truth depth, semantic and instance segmentation, and static-dynamic masks.

Stay Tuned! Details about the challenge tracks and the dataset will be released soon!