here. My research aims to solve real problems on big multimodal data, focusing on the interdisciplinary field of Machine Learning and Multimedia, specifically in robust deep learning, generative AI, and multimodal foundation model.
What's New:
Greetings! I am a staff research scientist and manager at Google, and an adjunct faculty member at Carnegie Mellon University.
My CV can be found What's New:
- Looking for a highly-motivated full-time and student researchers to work on video generation and multimodal foundation model. Check out our recent works and contact me if you are interested.
- [2024/01] Served as an area chair of CVPR 2024 and ICML 2024.
- [2024/01] Our MAGVIT-v2, the market's leading video tokenizer and a crucial component in VideoPoet and WALT, was accepted in ICLR 2024.
- [2023/12] Announcing VideoPoet and Future of Storytelling in this Short Video. VideoPoet represents my primary focus in 2023, from the initial design, then the implementation of versions v0, and up to its current milestone.
- [2023/11] Introducing W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space.
- [2023/11] StyleDrop is here, a model stylizes text-to-image synthesis by using a few reference images.
- [2023/03] Check out our research breakthrough on multi-task video generation, MAGVIT, accepted by CVPR 2023 with nearly perfect review scores! Website: https://magvit.cs.cmu.edu/.
- [2023/03] Served as an area chair of ICCV 2023 and NeurIPS 2023.
- [2023/01] Checked out our latest text-to-image MUSE based on masked vision transformer.
- [2022/12] Served as an area chair of CVPR 2023.
- [2022/06] Our paper Pyramid Adversarial Training Improves ViT Performance (CVPR'22) was selected in the best paper finalist.
- [2022/06] Our code for ViTGAN (ICLR'22) is on GitHub.
- [2022/06] The Controlled Noisy Web Labels dataset (ICML'20) is now on the TFDS, providing controlled and realistic noisy labels for robust deep learning.
- [2022/04] Our code for MaskGIT (CVPR'22) is on GitHub.
- [2021/10] Started working as an Adjunct faculty member at Carnegie Mellon University.
- [2021/09] Best reviewer ICML 2021, 2020 and Outstanding Reviewer NeurIPS 2021.
- [2021/07] Named as AI 2000 Most Influential Scholar (#31 in Multimedia).
- [2021/03] Our code for LeCAM-GAN (CVPR'21) is on GitHub. It is ranked as the leading GAN model on the CIFAR-100 and limited ImageNet dataset.
- [2021/05] Invited talk on robust deep learning for the Weakly Supervised Learning workshop in ICLR 2021 and LTI, Carnegie Mellon University.
- [2020/10] Served on the thesis committee for Junwei Liang, Ph.D. candidate of Carnegie Mellon University.
- [2020/10] Congrats to our former student Yu for receiving the illustrious Google Fellowship 2020.
- [2020/07] Introducing our work on robust deep learning on noisy labels published at ICML2020. [Google AI blog, project page]. Check out the recommendations on how to deal with noisy labels.
- [2020/06] Our dataset, The Garden of Forking Paths, is now available on GitHub. It is the first dataset that allows us to compare models in a quantitative way in terms of their ability to predict multiple plausible futures.
- [2020/05] Co-organized two workshops in CVPR 2020: (1) AI for Content Creation (2) Language and Vision with applications to Video Understanding.
- [2019/09] Congrats to our former intern Junwei for receiving Baidu Scholarship 2019 (10 recipients globally).
- [2019/09] Served as the NSF America's Seed Fund (NSF SBIR) panelist for AI.
- [2019/07] Our work on robust learning was named as the best paper candidate in ACL 2019 (1% of all submitted long papers).
- [2019/05] Our code for Composing Text and Image for Image Retrieval (CVPR 2019) is now on GitHub. It shows a new task of using vision and language research for retrieval.
- [2019/05] Our code for future activity prediction (CVPR 2019) is now available. It is the first and currently the best model for joint path and activity prediction. Check out excellent cool demo video.
- [2019/05] Our code for Eidetic-3D LSTM (ICLR 2019) is now on GitHub.
- [2019/03] Two guest lectures (LTI-11-775) on vision + language at Carnegie Mellon University.
- [2019/03] Served as an area chair of ACM Multimedia 2019.
- [2019/01] The code for our graph distillation (ECCV 2018) is now on GitHub.
- [2018/12] Check out our MemexQA dataset published in TPAMI 2019.
- [2018/09] Dealing with noisy data in deep learning? Check out our code for ICML 2018.
- [2018/07] Check out our code for visual question answering over sequence data (CVPR 2018)
- [2018/03] Congrats to our intern Zelun (Alan) Luo, co-hosted with Juan Carlos, for receiving Ph.D. offers from top universities (MIT/Stanford/UC Berkeley/CMU).