Self-paced Learning with Diversity (SPLD)

Last updated: Feb 16 2015

We propose an approach called self-paced learning with diversity (SPLD) which formalizes the preference for both easy and diverse samples into a general regularizer. This regularization term is independent of the learning objective, and thus can be easily generalized into various learning tasks.

Illustrative comparison of SPL and SPLD on “Rock Climbing” event using real samples. SPL tends to first select the easiest samples from a single group. SPLD inclines to select easy and diverse samples from multiple groups.

Toy example:

Download the code to run the following toy example. The example will produce the curriculumn in the following figure. The cide is written in [R] (download and install R from http://www.r-project.org/). The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16).
source("toy_example.r")

Hollywood2:

Download the code and the workspace. Unpack the zip file. The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16). Make sure that the following packages are installed properly. Note to obtain a fair comparison, we fix the parameter C=0.5 in the SVM in BatchTrain, SPL and SPLD. For more information please read our supplementary materials.
install.packages("Matrix")
install.packages("tseries")
install.packages("kernlab")
install.packages("LiblineaR")
install.packages("cluster")
Run the following code to get the results:
source("hollywood_adaboost.r")          #adaboost MAP = 0.4113735
source("hollywood_randomforest.r")    #random forest MAP = 0.2819644
source("hollywood_batchtrain.r")         #batch train MAP = 0.5816397
source("hollywood_spl.r")                   #spl MAP = 0.638854
source("hollywood_spld.r")                 #spld MAP = 0.6665301
*If you cannot get the reported results for SPL and SPLD using the above code, this is proabaly due to the randomness in the starting values, the clustering algorithm and the selection of the best iteration. You can keep tuning the parameters or run the following verification code.
source("hollywood_spl_verification.r")                  #spl  MAP = 0.638854
source("hollywood_spld_verification1.r")              #spld first way to verify MAP = 0.6665301
source("hollywood_spld_verification2.r")              #spld second way to verify (using the selected sample subsets) MAP = 0.6665301
The following code runs the significance test.
source("hollywood_ttest.r")                 #running the significance test t test

What's in the data folder?
dev.matrix: the precomputed kernel matrix for the training samples.
hollywood.feat.pca : reduced-dimensional feature vectors.
hollywood.idlist: a list of sample IDs.
hollywood.labels: label matrix for each action.
hollywood.splits: training and test split. 0 for training and 1 for testing.

Olympic Sports:

Download the code and the workspace. Similar to the Hollywood2 dataset, unpack the zip file. The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16). To obtain a fair comparison, we fix the parameter C=0.1 in the SVM in BatchTrain, SPL and SPLD. For more information please read our supplementary materials.
Run the following code to get the results:
source("olympic_adaboost.r")          #adaboost MAP = 0.6924610
source("olympic_randomforest.r")    #random forest MAP = 0.6332013
source("olympic_batchtrain.r")         #batch train MAP = 0.9060558
source("olympic_spl.r")                   #spl MAP = 0.9082879
source("olympic_spld.r")                 #spld MAP = 0.9310679
*If you cannot get the reported results for SPL and SPLD using the above code, this is proabaly due to the randomness in the starting values, the clustering algorithm and the selection of the best iteration. You can keep tuning the parameters or run the following verification code.
source("olympic_spl_verification.r")                #spl  to verify MAP = 0.9082879
source("olympic_spld_verification.r")              #spld to verify (using the selected sample subsets) MAP = 0.9310679
The following code runs the significance test.
source("olympic_ttest.r")                 #running the significance test t test
*For the Olympic Sports dataset, as it is very easy, there are lots of iterations having the same perfect validation ap (i.e. 100%). But they have the different test ap. So the randomness in breaking a tie affects the final results. In our experiments we thoroughly tuned the parameters and executed the code many times and then apply the heuristic rules to break the ties.

Multimedia Event Detection (MED):

Download the code and the workspace. Similar to the Hollywood2 dataset, unpack the zip file. For more information please read our supplementary materials.
Run the following code to get the results:
source("med_adaboost.r")           #adaboost MAP = 0.03049
source("med_randomforest.r")    #random forest MAP = 0.02820
source("med_batchtrain.r")         #batch train MAP = 0.08257
source("med_spl.r")                   #spl MAP = 0.09552
source("med_spld.r")                 #spld MAP = 0.12112
*If you cannot get the reported results for SPL and SPLD using the above code, this is proabaly due to the randomness in the starting values, the clustering algorithm and the selection of the best iteration. You can keep tuning the parameters or run the following verification code.
source("med_spl_verification1.r")                   #spl to verify MAP = 0.09552
source("med_spl_verification2.r")                   #spl another way to verify MAP = 0.09552
source("med_spld_verification1.r")                 #spl to verify MAP = 0.12112
source("med_spld_verification2.r")                 #spl another way to verify MAP = 0.12112
The following code runs the significance test.
source("med_ttest.r")                 #running the significance test t test

Citation:

(C) COPYRIGHT 2015, Carnegie Mellon University All Rights Reserved.