ACCV 2024 Workshop on

Rich Media with Generative AI

Date: Monday 8 Dec or Tuesday 9 Dec 9:00 am - 5:00 pm (Vienam Time/GMT+7), 2024
(specific date to be decided by ACCV)
Location: Hanoi Vienam + Zoom



Overview


The goal of this workshop is to showcase the latest developments of generative AI for creating, editing, restoring, and compressing rich media, such as images, videos, neural radiance fields, and 3D scene properties. Generative AI models, such as GAN and diffusion models have enabled remarkable achievements in rich media from both academia research and industrial applications. For instance, cloud-based video gaming is a booming industry with an expected global market value of over $12 billion by 2025. Generative AI transforms the gaming industry by enabling anyone to build and design games without professional artistic and technical skills, empowering immeasurable market growth.

With the success of the 1st RichMediaGAI Workshop@WACV 2024, we expand the 2nd RichMediaGAI workshop@ACCV2024 by organizing competitions with industry-level data, soliciting paper submissions, and continuing to invite top-tier speakers from both industry and academia to fuse the synergy.


Important Dates + Author Guidelines


Author Guidelines: Formatting, Page Limits, Author Kits, and Submission Policies follow the ACCV 2024 Author Guidelines

Challenges Data Available at: August 6, 2024, 11:59 PM PST
Regular Paper Submission Deadline: Extented to September 27, 2024, 11:59 PM PST
Challenges Results and Reports Submission Deadline: Extended to September 27, 2024, 11:59 PM PST
Submission Site: CMT Submission Site
Paper Review Back and Decision Notification: October 4, 2024, 11:59 PM PST
Challenges Results and Decision Notification: October 4, 2024, 11:59 PM PST
Camera-Ready Deadline: October 10, 2024, 11:59 PM PST


1. Regular Paper Submissions


Papers addressing topics related to image/video restoration, compression, enhancement, and manipulation, using generative AI technologies are welcome to submit. The topics include but are not limited to:

Author Guidelines: Formatting, Page Limits, Author Kits, and Submission Policies follow the ACCV 2024 Author Guidelines All papers must be uploaded to the submission site by the deadline. There is no rebuttal for this call. Reviews and paper decisions will be sent back to the authors on the date specified above.

2. Challenges


Call for Participation

Cloud gaming poses tremendous challenges for compression and transmission. To avoid delay and bandwidth overload, high-quality frames need to be heavily compressed with very low latency. Traditional codecs like H.264/H.265/H.266 or recent neural video coding targeting natural videos generally do not perform well.

Generative AI technologies, e.g., super-resolution, image synthesis and rendering, can largely alleviate the transmission issues. Server-side computation and transmission can be reduced by leveraging the computation power of client de- vices. For example, the server can render low-resolution (LR) frames to transmit, and high-resolution (HR) frames can be computed on client side. In multiview gaming, the server can render part of views to transmit, and the remaining views can be computed by client devices. Nvidia's Deep Learning Super Sampling (DLSS) has commercialized this idea, and one key factor of its success is the large-scale ground-truth LR-HR or multiview gaming data used for training.

In comparison, the research community uses pseudo training data for many restoration tasks. For example, for super-resolution, the LR data is generated from the HR data by downsampling and adding degradation like noises and blurs. Such pseudo data do not match real gaming data. True LR gaming frames are high-quality, sharp and clear without noises or blurs. There are unnatural visual effects and object movements, but with limited motion blur, different from captured natural videos. We need ground-truth gaming data for effective training.

In this competition, a large computer-synthesized ground-truth dataset is provided, targeting two different applications:

The winners will be announced at the RichMediaGAI workshop, and the top 3 non-corporate winners of each track will be rewarded with 1st $2000, 2nd $1000, 3rd $500. The winners are invited to submit a paper to the RichMediaGAI workshop through the paper submission system. For the paper to be accepted, each paper must be a self-contained description of the method, and be detailed enough to reproduce the results. The paper submission must follow the ACCV 2024 Author Guidelines .





3. Invited Talks

Chen Change Loy
Nanyang Technological University

Chen Change Loy is a President's Chair Professor with the College of Computing and Data Science, Nanyang Technological University, Singapore. He is the Lab Director of MMLab@NTU and Co-associate Director of S-Lab. Prior to joining NTU, he served as a Research Assistant Professor at the MMLab of The Chinese University of Hong Kong, from 2013 to 2018. His research interests include computer vision and deep learning with a focus on image/video restoration and enhancement, generative tasks, and representation learning. He serves as an Associate Editor of the International Journal of Computer Vision (IJCV), IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and Computer Vision and Image Understanding (CVIU). He also serves/served as an Area Chair of top conferences such as ICCV, CVPR, ECCV, ICLR and NeurIPS. He will serve as the Program Co-Chair of CVPR 2026. He is a senior member of IEEE.

Dong Tian
InterDigital

Dong Tian is a Senior Director with InterDigital, Inc. He has been actively contributing to MPEG industry standards and academic communities for 20+ years. Prior to InterDigital, Inc. He holds 30+ U.S.-granted patents and 50+ recent publications in top-tier journals/transactions and conferences. His research interests include image processing, 3D video, point cloud processing, and deep learning. He serves as the Chair of MPEG-AI, MPEG 3DGH on AI-Based Graphic Coding since 2021, and MSA TC from 2023 to 2025, and a General Co-Chair of MMSP'20 and MMSP'21.

Yanzhi Wang
Northeastern Univeristy

Yanzhi Wang is currently an associate professor and faculty fellow at Dept. of ECE at Northeastern University, Boston, MA. His research interests focus on model compression and platform-specific acceleration of deep learning applications. His work has been published broadly in top conference and journal venues (e.g., DAC, ICCAD, ASPLOS, ISCA, MICRO, HPCA, PLDI, ICS, PACT, ISSCC, AAAI, ICML, NeurIPS, CVPR, ICLR, IJCAI, ECCV, ICDM, ACM MM, FPGA, LCTES, CCS, VLDB, PACT, ICDCS, RTAS, Infocom, C-ACM, JSSC, TComputer, TCAS-I, TCAD, TCAS-I, JSAC, TNNLS, etc.). He has received six Best Paper and Top Paper Awards, and one Communications of the ACM cover featured article. He has another 13 Best Paper Nominations and four Popular Paper Awards. He has received the U.S. Army Young Investigator Program Award (YIP), IEEE TC-SDM Early Career Award, APSIPA Distinguished Leader Award, Massachusetts Acorn Innovation Award, Martin Essigmann Excellence in Teaching Award, Massachusetts Acorn Innovation Award, Ming Hsieh Scholar Award, and other research awards from Google, MathWorks, etc.

Tianfan Xue
City University of Hong Kong

Tianfan Xue is a Vice Chancellor assistant professor at the Department of Information Engineering, the Chinese University of Hong Kong. His research interests include computer vision, machine learning, and computer graphics, with a focus on generative AI and neural rendering.

Mike Zheng Shou
National University of Sigapore

Mike Z. Shou is an Assistant Professor at NUS. His research focuses on Computer Vision and Deep Learning, with an emphasis on developing intelligent system for video understanding and creation. Mike was awarded Wei Family Private Foundation Fellowship from 2014 to 2017. Mike received the best student paper nomination at CVPR2017. His team won the first place in the International Challenge on Activity Recognition (ActivityNet) 2017. Having won the Singapore NRF Fellowship award for his proposal tilted “Towards Next-generation Video Intelligence: Training Machines to Understand Actions and Complex Events“ which carries a research grant that provides early career researchers to carry out independent research locally. Mike is looking forward to developing new deep learning methods to allow machines to understand actions and complex events in videos -- this can power many applications such as perception system for self-driving car, caring-robot for the elderly, smart CCTV cameras, social media recommendation system, intelligent video creation tool for journalists and filmmakers, to name a few.

Junfeng He
Google Research

Junfeng He is with Google Inc. He has published more than 25 papers in top tier conference/journals such as CVPR, ICML, TPAMI, IEEE proceedings, etc., cited by more than 1000 times. He has served as TPC member of ACM MM, CVPR, and several other conferences.




4. Program Schedule (TBD)





5. Organizers


Wei Jiang
Futurewei Technologies
Lebin Zhou
Santa Clara University
Jinwei Gu
Chinese Univeristy of Hong Kong

Kun Han
University of California Irvine
Zijiang (James) Yang
Guard Strike

Contacts

Dataset related questions: Lebin Zhou

Paper related and other general questions: Wei Jiang, Jinwei Gu