Data Set

  • Controlled Noisy Web Labels (ICML 2020) - First dataset and benchmark for realistic, real-world label noise sourced from the web.
  • MemexQA (TPAMI 2019) - Multimodal dataset consisting of real personal photos and crowd-sourced questions/answers.
  • YouTube-8M - Large-scale labeled video dataset consisting of millions of YouTube videos.
  • CMU Viral Video Dataset (ICMR 2014) - Public dataset for viral video study.