Greetings! I am a Research Scientist at Apple focused on visual generation and multimodal foundation models. Previously, I worked at ByteDance/TikTok and Google and as an Adjunct Faculty member at Carnegie Mellon University. My research addresses real-world challenges in robust deep learning, generative AI, and large-scale multimodal data. CV
What's New:
- Looking for a highly-motivated full-time to work on visual generation foundation models. Check out our recent works and contact me if you are interested.
- [2025/05] Introduced Seaweed-7B, a cost-efficient foundation model for video generation. See our demo video and project page. Grateful for an outstanding year of collaboration with an exceptional team.
- [2025/03] Released Long Context Tuning (LCT), enabling scene-level video storytelling up to 5 minutes—pushing the frontier of GenAI research in video generation.
- [2025/04] Delivered a lecture at the DeepLearn. Check out the slides [part 1, part 2, part 3].
- [2025/02] Served as Area Chair for ICML 2025, ICCV 2025, and NeurIPS 2025.
- [2025/01] Introduced Seaweed-APT, anone-step video generation method for high-quality video synthesis.
- [2024/10] Served as Area Chair for ICLR 2025 and CVPR 2025, Action Editor for TMLR.
- [2024/08] Honored to receive the IJCAI-JAIR Best Paper Award. Thanks to collaborators Curtis and Isaac!
- [2024/07] Grateful to receive the ICML Best Paper Award. Congratulations to the team!
- [2024/07] Gave a keynote at ICME 2024. Slides available here.
- [2024/08] Appointed Associate Editor for TPAMI.
- [2024/01] Served as Area Chair for CVPR 2024 and ICML 2024.
- [2024/01] MAGVIT-v2, a leading video tokenizer powering VideoPoet and WALT, was accepted to ICLR 2024.
- [2023/12] Announced VideoPoet, my primary 2023 focus, from initial design through v0 to current milestones. Watch the intro video.
- [2023/11] Released W.A.L.T, a diffusion-based transformer model for photorealistic video generation in a unified latent space.
- [2023/11] Released StyleDrop, enabling few-shot personalized text-to-image synthesis.
- [2023/03] MAGVIT for multi-task video generation was accepted to CVPR 2023 as Highlight.
- [2023/03] Served as Area Chair for ICCV 2023 and NeurIPS 2023.
- [2023/01] Introduced MUSE, a masked vision transformer for text-to-image generation.
- [2022/12] Served as Area Chair for CVPR 2023.
- [2022/06] Pyramid Adversarial Training (CVPR'22) selected as a Best Paper Finalist.
- [2022/06] Released code for ViTGAN (ICLR'22).
- [2022/06] Controlled Noisy Web Labels dataset (ICML'20) is now available via TFDS.
- [2021/10] Joined Carnegie Mellon University as Adjunct Faculty.
- [2021/09] Received Best Reviewer awards at ICML 2020–2021 and Outstanding Reviewer at NeurIPS 2021.
- [2021/03] Released LeCAM-GAN (CVPR'21), top-ranked on CIFAR-100 and ImageNet (25%).
- [2021/05] Gave invited talks on robust deep learning at ICLR 2021 WeaSuL Workshop and CMU LTI.
- [2020/10] Congrats to Yu Wu on receiving the Google Fellowship 2020.
- [2020/07] Published our work on robust learning from noisy labels at ICML 2020. [Blog, Project]
- [2020/06] Released The Garden of Forking Paths dataset for evaluating multiple plausible futures.
- [2020/05] Co-organized two CVPR 2020 workshops: AI for Content Creation and Language and Vision.
- [2019/09] Congrats to intern Junwei Liang on receiving Baidu Scholarship 2019.
- [2019/09] Served as panelist for NSF America's Seed Fund (SBIR) on AI.
- [2019/07] Best Paper Candidate at ACL 2019 (top 1%).
- [2019/05] Released TIRG (CVPR'19) for vision-language image retrieval.
- [2019/05] Released activity prediction model (CVPR'19), with demo.
- [2019/05] Released Eidetic-3D LSTM (ICLR'19).
- [2019/03] Gave guest lectures (LTI-11-775) on vision + language at CMU.
- [2019/01] Released Graph Distillation (ECCV'18) on GitHub.
- [2018/12] Released MemexQA dataset (TPAMI'19).
- [2018/09] Released MentorNet for learning with noisy data (ICML'18).
- [2018/07] Released FVTA code for visual QA over sequences (CVPR'18).
- [2018/03] Congrats to intern Zelun Luo on receiving Ph.D. offers from top institutions (MIT, Stanford, UC Berkeley, CMU).