Challenge Summary *under construction*
Next step: JMLR Special issue on Large Scale Learning

Submission to the special issue is open to everyone, granted that you follow the challenge setting for a fair comparison. See Data sets and performance measures.
Results of the Challenge have been announced during the LSL Workshop, ICML 2008
Wild track
SVM track
Additional best student and best newcomer awards

To date the reevaluation is still ongoing, but a leading group manifested itself which allows us to declare the winners of the challenge, each receiving prize money or travel grants. As the best methods perform very similar (almost same score) we don't rank the winners.
The overall score of each participant is computed as the average rank of its submission w.r.t. six scalar measures that aggregate the predictive accuracy and computational effort, for various dataset sizes. For more details see Assessment of the submissions and a discussion below.
What, who, how...
Interestingly, most participants used linear algorithms, variants of L1/L2SVM solvers:
 SGDQn (Antoine Bordes)
 Newton (Olivier Chapelle)
 Dual coord. descent (Hsiang Fy Yu)
 Tripple Jump SVM (HanShen Huang and ChunNan Hsu)
 Interior point (Kistian Woodsend)
Among the non linear methods are:
 Parallel decision tree (Yossi Richter)
 Averaging of selective naive Bayes (Marc Boulle)
 Averaging of RBFkernel SVM (Jochen Garcke)
While nonlinear methods usually get a better predictive accuracy than linear ones, they often are significantly slower; all the more so as 100k examples was often enough for linear algorithms to converge due to the low data dimensionality.
This table displays the best performing methods on the challenge datasets.
Dataset  Best known method  Best submission  
model  aoPRC  aoPRC  method  
alpha  quadratic  0.009774  0.0854  AV SVM  garcke 
beta  quadratic  0.304563  0.4577  AV SVM single  garcke 
gamma  semiquad  0.009418  0.0116  Avg NB Classif.  MB 
delta  semiquad  0.071005  0.0801  Avg NB Classif.  MB 
epsilon  linear  0.033645  0.0341  IPM SVM 2  kristian 
zeta  linear  0.011361  0.0116  SDM SVM L1  yahoo 
dna  WDSVM  0.4558  0.8030  SDM SVM L2  yahoo 
ocr  n.n.  n.n.  0.1579  SDM SVM L2  yahoo 
fd  n.n.  n.n.  0.2314  Avg. NB Classif.CR  MB 
webspam  n.n.  n.n.  0.0004  SDM SVM L2  yahoo 
Generally people used the (simpleminded) data representation we suggested in our svmlightformat convert.py script. No one even tried quadratic features. Thus there is often a huge gap between what is possible and what was achieved (as we could see on the artificial datasets where we knew the optimal Bayes classifier).
The programming languages used are C/C++, Java and Matlab. The Wild track and the Alpha synthetic dataset have been the most attractive ones, with respectively 263 entries and 72 submissions. The most active participants so far are Christian Woodsend (Interior Point SVM: linear, parallel, wild track), Antoine Bordes (SGDQN, Larank), Olivier Chapelle and Sathiya Keerthi (SDML1/L2, Newton). The Parallel track unfortunately did not attract much interest (see below).
Workshop and Discussion
The Pascal workshop on Large Scale Learning took place after ICML 2008, in Helsinki. All 9 talks and the discussion have been recorded and will be available at Pascal video lectures. The discussion focused on two main points: aggregating the scores and what next.
Scores, criteria, cheating...
The overall results were summarized by three figures: time vs. aoPRC; dataset size vs. training time; dataset size vs. aoPRC, reflecting the main questions from the enduser: how long does it take to reach a given performance; how does the computational effort increases with dataset size, and which amount of data is needed for reaching a given accuracy. The second figure (dataset size vs. training time) is used to compute the empirical computational complexity, which raises the question: smart stopping criteria could be used to reach a sublinear complexity... is this fair ?
The current aggregation of the ranks is detrimental to methods which are accurate but slow; what is worse, the time needed to adjust the algorithm parameters is not accounted for, which is especially unfair to parameterless methods (e.g., Bayesian)... How to account for the model selection cost ?
The worst of all is that, by submitting a fast and idiot method, you could scramble the ranks... and kill the order. Clearly, algorithm assessment is a multiobjective problem and there is no way we can come with a single fair total order. Best is to look at the plots to see where is your optimal tradeoff given the amount of data / computational budget you have.
Next Challenge
Everybody seemed to be interested in having a second round, even  as could have been expected  nobody was willing to take on. Shall we ? We don't know yet... On one hand the infrastructure is up and running; on the other hand it is difficult to obtain LARGE public datasets, and adjust the criteria accordingly...
Some points raised during the discussion:
 LSL *must* go parallel; reserve a few nodes of the PASCAL cluster for the challenge and consider fixed time budgets of e.g. 10, 100, 1000 and 1000 seconds;
 People might submit code to be run on the PASCAL cluster (no mistakes about the computational effort; no need to distribute huge datasets...);
 The cost of model selection must definitely be accounted for;
 The aggregation would proceed by normalizing each one of the six current scores; and averaging the normalized scores (preventing the scramblingbased cheating...).
Help and any ideas are welcome. Do you want to help organizing a LSveryL Challenge in 20092010 ? Please contact us.
How the LSL challenge evolved
After an initial proposal was presented at the NIPS*07 large scale learning workshop, and received few encouragements, a more polished (and less ambitious ...) version was presented during the PASCAL meeting in Bled, January 2008; the challenge started end of February.
Then we had an endlessly long time without any submissions at all; and after that a number of submissions that performed worse than random... Ha! Olivier Chapelle managed to beat Vojtech's baseline SVM.
For the following period, a race went on between Olivier and Kristian Woodsend with his incredibly fast Interior Point SVM. In the end Olivier won the race with his Newton SVM which remained top ranking till the end of the challenge.
A long time passed before other submissions appeared. Marc Boulle's Bayesian submission ranked high in accuracy but overall received a low rank due to the computational effort...
At that point, a critical post from John Langford drew some more attention to the challenge (thanks John) and we accordingly introduced the parallel track; which finally interested very few people (!?).
Finally, the number of submissions totaled 44, with 49 participants registered.