Challenge Summary *under construction*

Next step: JMLR Special issue on Large Scale Learning

  • Submission: 15 January 2009
  • Decision: 15 March 2009
  • Final versions: 15 April 2009

Submission to the special issue is open to everyone, granted that you follow the challenge setting for a fair comparison. See Data sets and performance measures.

Results of the Challenge have been announced during the LSL Workshop, ICML 2008

Wild track

  • SGD-QN - Antoine Bordes
  • Newton SVM - Olivier Chapelle
  • SDM SVM L1/L2 - Olivier Chapelle, Sathiya Keerthi

SVM track

  • Liblinear - Hsiang Fy Yu

Additional best student and best newcomer awards

  • Interior point SVM - Kristian Woodsend
  • Triple Jump SVM - Han-Shen Huang and Chun-Nan Hsu

To date the re-evaluation is still ongoing, but a leading group manifested itself which allows us to declare the winners of the challenge, each receiving prize money or travel grants. As the best methods perform very similar (almost same score) we don't rank the winners.

The overall score of each participant is computed as the average rank of its submission w.r.t. six scalar measures that aggregate the predictive accuracy and computational effort, for various dataset sizes. For more details see Assessment of the submissions and a discussion below.

What, who, how...

Interestingly, most participants used linear algorithms, variants of L1/L2-SVM solvers:

Among the non linear methods are:

While non-linear methods usually get a better predictive accuracy than linear ones, they often are significantly slower; all the more so as 100k examples was often enough for linear algorithms to converge due to the low data dimensionality.

This table displays the best performing methods on the challenge datasets.

DatasetBest known method Best submission
alphaquadratic0.0097740.0854AV SVM - garcke
betaquadratic0.3045630.4577AV SVM single - garcke
gammasemi-quad0.0094180.0116Avg NB Classif. - MB
deltasemi-quad0.0710050.0801Avg NB Classif. - MB
epsilonlinear0.0336450.0341IPM SVM 2 - kristian
zetalinear0.0113610.0116SDM SVM L1 - yahoo
dnaWD-SVM0.45580.8030SDM SVM L2 - yahoo
ocrn.n.n.n.0.1579SDM SVM L2 - yahoo
fdn.n.n.n.0.2314Avg. NB Classif.CR - MB
webspamn.n.n.n.0.0004SDM SVM L2 - yahoo

Generally people used the (simple-minded) data representation we suggested in our svmlight-format convert.py script. No one even tried quadratic features. Thus there is often a huge gap between what is possible and what was achieved (as we could see on the artificial datasets where we knew the optimal Bayes classifier).

The programming languages used are C/C++, Java and Matlab. The Wild track and the Alpha synthetic dataset have been the most attractive ones, with respectively 263 entries and 72 submissions. The most active participants so far are Christian Woodsend (Interior Point SVM: linear, parallel, wild track), Antoine Bordes (SGD-QN, Larank), Olivier Chapelle and Sathiya Keerthi (SDM-L1/L2, Newton). The Parallel track unfortunately did not attract much interest (see below).

Workshop and Discussion

The Pascal workshop on Large Scale Learning took place after ICML 2008, in Helsinki. All 9 talks and the discussion have been recorded and will be available at Pascal video lectures. The discussion focused on two main points: aggregating the scores and what next.

Scores, criteria, cheating...

The overall results were summarized by three figures: time vs. aoPRC; dataset size vs. training time; dataset size vs. aoPRC, reflecting the main questions from the end-user: how long does it take to reach a given performance; how does the computational effort increases with dataset size, and which amount of data is needed for reaching a given accuracy. The second figure (dataset size vs. training time) is used to compute the empirical computational complexity, which raises the question: smart stopping criteria could be used to reach a sub-linear complexity... is this fair ?

The current aggregation of the ranks is detrimental to methods which are accurate but slow; what is worse, the time needed to adjust the algorithm parameters is not accounted for, which is especially unfair to parameterless methods (e.g., Bayesian)... How to account for the model selection cost ?

The worst of all is that, by submitting a fast and idiot method, you could scramble the ranks... and kill the order. Clearly, algorithm assessment is a multi-objective problem and there is no way we can come with a single fair total order. Best is to look at the plots to see where is your optimal trade-off given the amount of data / computational budget you have.

Next Challenge

Everybody seemed to be interested in having a second round, even - as could have been expected - nobody was willing to take on. Shall we ? We don't know yet... On one hand the infrastructure is up and running; on the other hand it is difficult to obtain LARGE public datasets, and adjust the criteria accordingly...

Some points raised during the discussion:

Help and any ideas are welcome. Do you want to help organizing a LS-very-L Challenge in 2009-2010 ? Please contact us.

How the LSL challenge evolved

After an initial proposal was presented at the NIPS*07 large scale learning workshop, and received few encouragements, a more polished (and less ambitious ...) version was presented during the PASCAL meeting in Bled, January 2008; the challenge started end of February.

Then we had an endlessly long time without any submissions at all; and after that a number of submissions that performed worse than random... Ha! Olivier Chapelle managed to beat Vojtech's baseline SVM.

For the following period, a race went on between Olivier and Kristian Woodsend with his incredibly fast Interior Point SVM. In the end Olivier won the race with his Newton SVM which remained top ranking till the end of the challenge.

A long time passed before other submissions appeared. Marc Boulle's Bayesian submission ranked high in accuracy but overall received a low rank due to the computational effort...

At that point, a critical post from John Langford drew some more attention to the challenge (thanks John) and we accordingly introduced the parallel track; which finally interested very few people (!?).

Finally, the number of submissions totaled 44, with 49 participants registered.