Self-paced Learning with Diversity (SPLD)

Last updated: Feb 16 2015

We propose an approach called self-paced learning with diversity (SPLD) which formalizes the preference for both easy and diverse samples into a general regularizer. This regularization term is independent of the learning objective, and thus can be easily generalized into various learning tasks.

Illustrative comparison of SPL and SPLD on “Rock Climbing” event using real samples. SPL tends to first select the easiest samples from a single group. SPLD inclines to select easy and diverse samples from multiple groups.

Toy example:

Download the code to run the following toy example. The example will produce the curriculumn in the following figure. The cide is written in [R] (download and install R from http://www.r-project.org/). The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16).

source("toy_example.r")

Hollywood2:

Download the code and the workspace. Unpack the zip file. The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16). Make sure that the following packages are installed properly. Note to obtain a fair comparison, we fix the parameter C=0.5 in the SVM in BatchTrain, SPL and SPLD. For more information please read our supplementary materials.

install.packages("Matrix")
install.packages("tseries")
install.packages("kernlab")
install.packages("LiblineaR")
install.packages("cluster")

Run the following code to get the results:

source("hollywood_adaboost.r") #adaboost MAP = 0.4113735

source("hollywood_randomforest.r") #random forest MAP = 0.2819644

source("hollywood_batchtrain.r") #batch train MAP = 0.5816397

source("hollywood_spl.r") #spl MAP = 0.638854

source("hollywood_spld.r") #spld MAP = 0.6665301

*If you cannot get the reported results for SPL and SPLD using the above code, this is proabaly due to the randomness in the starting values, the clustering algorithm and the selection of the best iteration. You can keep tuning the parameters or run the following verification code.

source("hollywood_spl_verification.r") #spl MAP = 0.638854

source("hollywood_spld_verification1.r") #spld first way to verify MAP = 0.6665301

source("hollywood_spld_verification2.r") #spld second way to verify (using the selected sample subsets) MAP = 0.6665301

The following code runs the significance test.

source("hollywood_ttest.r") #running the significance test t test

What's in the data folder?
dev.matrix: the precomputed kernel matrix for the training samples.
hollywood.feat.pca : reduced-dimensional feature vectors.
hollywood.idlist: a list of sample IDs.
hollywood.labels: label matrix for each action.
hollywood.splits: training and test split. 0 for training and 1 for testing.

Olympic Sports:

Download the code and the workspace. Similar to the Hollywood2 dataset, unpack the zip file. The code is tested on [R] x86_64-w64-mingw32 version 3.0.1 (2013-05-16). To obtain a fair comparison, we fix the parameter C=0.1 in the SVM in BatchTrain, SPL and SPLD. For more information please read our supplementary materials.

Run the following code to get the results:

source("olympic_adaboost.r") #adaboost MAP = 0.6924610

source("olympic_randomforest.r") #random forest MAP = 0.6332013

source("olympic_batchtrain.r") #batch train MAP = 0.9060558

source("olympic_spl.r") #spl MAP = 0.9082879

source("olympic_spld.r") #spld MAP = 0.9310679

source("olympic_spl_verification.r") #spl to verify MAP = 0.9082879

source("olympic_spld_verification.r") #spld to verify (using the selected sample subsets) MAP = 0.9310679

The following code runs the significance test.

source("olympic_ttest.r") #running the significance test t test

*For the Olympic Sports dataset, as it is very easy, there are lots of iterations having the same perfect validation ap (i.e. 100%). But they have the different test ap. So the randomness in breaking a tie affects the final results. In our experiments we thoroughly tuned the parameters and executed the code many times and then apply the heuristic rules to break the ties.

Multimedia Event Detection (MED):

Download the code and the workspace. Similar to the Hollywood2 dataset, unpack the zip file. For more information please read our supplementary materials.

Run the following code to get the results:

source("med_adaboost.r") #adaboost MAP = 0.03049

source("med_randomforest.r") #random forest MAP = 0.02820

source("med_batchtrain.r") #batch train MAP = 0.08257

source("med_spl.r") #spl MAP = 0.09552

source("med_spld.r") #spld MAP = 0.12112

source("med_spl_verification1.r") #spl to verify MAP = 0.09552

source("med_spl_verification2.r") #spl another way to verify MAP = 0.09552

source("med_spld_verification1.r") #spl to verify MAP = 0.12112

source("med_spld_verification2.r") #spl another way to verify MAP = 0.12112

The following code runs the significance test.

source("med_ttest.r") #running the significance test t test