Papers - Rich Media with Generative AI

Regular Paper Submissions

We invite submissions of high-quality papers addressing novel research in generative AI for rich media under Regular Paper Submissions, including but not limited to:

AI-driven media generation, restoration, and enhancement
3D content generation using AI
AI-driven media compression and codec design
Editing and manipulation of rich media with generative AI
Multi-modal generative AI techniques
Rich media creation and editing with large language models
Acceleration of generative AI models

Challenge Paper Submissions

We also welcome challenge-related papers that focus on innovative solutions, methodologies, and findings from the workshop's challenges. These submissions should highlight technical contributions, benchmark results, and insights gained from participation.

Fast-Track Submissions

In addition to the general submission guidelines, high-quality rejected submissions at ACM MM 25 may be submitted together with a supplementary package including the original reviews as well as a statement on changes made to the paper. These materials will be checked by the TPC chairs, and may be eligible for a fast-track review process.

Accepted papers will be published in proceedings by Sheridan Publishing on behalf of ACM.

Papers must follow the official ACM Multimedia 2025 format. For details, please refer to the ACM Multimedia 2025 Call for Papers. Submissions should be anonymous for peer review and should not exceed 8 pages (excluding references).

Submission Link: OpenReview Paper Submission

Please note that every author needs an OpenReview account for paper submission, and OpenReview's new profiles created without an institutional email will go through a moderation process that can take up to two weeks.

Accepted Papers

SLMQuant: Benchmarking Small Language Model Quantization for Practical Deployment (Jiacheng Wang, Yejun Zeng, Jinyang Guo, Yuqing Ma, Aishan Liu, Xianglong Liu)
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics (Yihan Hu, Jianing Peng, Yiheng Lin, Ting Liu, Xiaochao Qu, Luoqi Liu, Yao Zhao, Yunchao Wei)
Efficient and Accurate Post-Training Sparsification of Large Language Models with Proximal Operators (Pu Zhao, Dani Gunawan, Xuan Shen, Zheng Zhan, Xuehang Guo, Jun Liu, Zhenglun Kong, Yanzhi Wang, Gaowen Liu, Xue Lin)
M³VIR: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation (Yuanzhi Li, Lebin Zhou, Nam Ling, Zhenghao Chen, Wei Wang, Wei Jiang)
BlingDiff: High-Fidelity Virtual Jewelry Try-On with Detail-Optimized Diffusion (Yunfang Niu, Lingxiang Wu, Dong Yi, Lu Zhou, Jinqiao Wang)
Text-to-Image Generation Post-Training with Pixel-Space Loss (Christina Zhang, Simran Motwani, Matthew Yu, Ji Hou, Felix Juefei-Xu, Sam Tsai, Peter Vajda, Zijian He, Jialiang Wang)