Recent benchmarks have advanced 3D and 4D scene reconstruction, with datasets such as Mip-NeRF 360 focusing on static scenes, and later works like D-NeRF and Neural 3D Video introducing limited dynamics. However, these datasets primarily evaluate reconstruction quality and do not support systematic evaluation of FVV generation and delivery. In parallel, neural video compression benchmarks such as Vimeo-90K study bitrate-fidelity trade-offs in the image domain but lack 3D geometry and camera calibration.
To address these gaps, we introduce the M3ISR dataset, a fully synthetic and richly annotated benchmark extending M3VIR (released by RichMediaGAI at MM2025) for evaluating both generative FVV synthesis and efficient neural delivery. Scenes contain static environments and dynamic objects rendered from six synchronized ego-centric cameras, with ground-truth depth, semantic and instance segmentation, and static-dynamic masks.
The RichMediaGAI Challenge 2026 includes five tracks:3D Gaussian Splatting (3DGS) has emerged as a promising representation for novel view synthesis (NVS). However, large-scale and complex scenes remain challenging due to intricate geometry, view-dependent appearance, and fine-grained details. The goal of Track 1 is to improve the rendering quality of 3DGS-based NVS while maintaining efficient real-time computational performance.
4D Gaussian Splatting (4DGS) extends 3DGS to dynamic scenes by incorporating the temporal dimension. However, real-world dynamic scenes introduce additional challenges, including complex motion, occlusion, and temporally varying appearance, making it difficult to maintain both spatial fidelity and temporal consistency. The goal of Track 2 is to improve the quality of 4DGS-based NVS and encourage methods that achieve photorealistic rendering across both spatial and temporal dimensions.
4DGS supports free-view video, making it highly attractive for immersive streaming applications. However, real-time streaming of 4DGS content introduces strict constraints on latency and bandwidth, and efficiently transmitting temporally coherent Gaussian representations while preserving rendering quality under limited network conditions remains an open problem. The goal of Track 3 is to develop efficient 4DGS streaming strategies that balance visual fidelity, transmission efficiency, and low-latency playback.
3DGS provides a powerful representation for high-quality NVS. However, the large number of Gaussian primitives and associated attributes often results in substantial storage and transmission costs, limiting scalability and practical deployment. Reducing redundancy in the parameters of 3DGS representations while preserving rendering quality remains a key challenge. The goal of Track 4 is to develop effective 3DGS compression methods that achieve a better trade-off between representation compactness, rendering fidelity, and computational efficiency.
4DGS extends 3DGS to dynamic scenes by modeling both spatial and temporal variations. While this enables photorealistic free-view video rendering, it also introduces significantly higher representation complexity and redundancy across both space and time. Efficiently reducing redundancy in the parameters of 4DGS representations without sacrificing spatial quality or temporal consistency remains an open problem. The goal of Track 5 is to develop effective 4DGS compression methods that improve epresentation compactness while maintaining high rendering fidelity and efficient decoding performance.