We have received a request to add a parallel track to the challenge, which we now implemented. The required changes to the instructions are hi-lighted in red.


The overall goal is to develop a 2-class classifier such that it achieves a low error in the shortest possible time using as few datapoints as possible.

  • All participants are required to identify themselves with a login name, full name and a valid Email address. Your email address will not be made public and is only used for registration and announcements (test dataset release, workshop date, awards). The login name and your full name will appear on the evaluation page.
  • In order to be included in the final evaluation process, every participant should compete on at least five datasets in a single track, i.e. submit results on the validation/test set of five problems for the Wild track, or for one of the SVM tracks.
  • Every participant will provide computational time (training time and separately time required to do predictions for the optimal model-parameter setting excluding data loading times) associated to every submission, accompanied with an estimation of their computing power. This estimation is provided by running a calibration program provided by the challenge organizers.
  • In order to assess more precisely how the accuracy depends on the computational time, every participant is kindly asked to provide the predictions obtained for various fractions of the total computational time T, on the biggest dataset he or she will consider. Preferred fractions: T/10,T/9...,T
  • After the end of the competition, every participant will provide an extended abstract (4 pages) describing the algorithm.
  • All participants will be requested to provide executable or source code, allowing to re-run the algorithm under the same computing conditions, for a later timing re-calibration and re-run of the top-ten methods.
In order to fairly assess the performances of the SVM-QP solvers, the participants to the SVM tracks are asked to comply with two requirements:
  • NO DATA PREPARATION ALLOWED (no feature construction, no feature selection, ...).
  • NO PARAMETER TUNING ALLOWED (penalization factor C, and Gaussian kernel radius tau, are set to a prescribed value)

Challenge Tasks

The overall goal is to tune your method such that achieves a low test error measured by the area over the precision recall curve in the shortest possible time and using as few datapoints as possible.

Wild Competition Track

In this track you are free to do anything that leads to more efficient, more accurate methods, e.g. perform feature selection, find effective data representations, use efficient program-code, tune the core algorithm etc. It may be beneficial to use the raw data representation (described below). For each dataset your method competes in you are requested to:
  • Train on 10^[2,3,4,5,6,7] and the maximum number of available datapoints. If your method cannot cope with all dataset sizes you are allowed to skip over too large datasets. For all of the training sessions the training time and time required to compute test outputs has to be recorded.
  • Additionally for the biggest dataset your method can deal with, we ask you to provide ten intermediate time/output-recordings. These can be obtained, by for example training twice and computing time and output after time T/10, T/9, ...T (where T is the overall training time). Do not forget to include test-times in each of the recordings.

SVM Tracks

To allow for a comparison w.r.t. to non-SVM solvers, all SVM methods are required to compete in both tracks and therefore for SVMs step 1 is to attack the wild-track task. In addition to what has to be done in the wild track we ask you to do the following experiments.
  1. To measure convergence speed, we ask you to re-do the wild-track experiment for a fixed setting of C=0.01, epsilon=0.01 (and rbf kernel width tau=10) but measuring primal objective only.
  2. To simulate model selection you are requested to train SVMs for different C and rbf-widths keeping epsilon=0.01 fixed, again only measuring objective value.
Here epsilon denotes the relative duality gap (obj_primal-obj_dual)/obj_primal<epsilon=0.01. If your solver has a different stopping condition, choose it reasonably, i.e. such that you expect it to have similar duality gap. While early stopping is OK it may hurt your method in evaluation: If the objective of your svm solver deviates too much from others, i.e. by more than 5% (your_obj-obj_min)/your_obj<0.05, it will get low scores. Note that you have to use the data representation obtained by running the svmlight-conversion script in the second part (model selection) of the experiment. The following values for C/rbf-width shall be used:
  • for the Linear SVM Track it is required to train SVMs for Cs=[0.0001, 0.001, 0.01, 0.1, 1, 10]
  • for the RBF Kernel SVM Track it is required to train SVMs for fixed C=0.01 and tau=[0.01, 0.1, 1, 10, 100, 1000], where

    K(x,y)= exp(-||x-y||^2 / tau).
Objective values are computed as follows:
  • for the Linear SVM Track

  • for the RBF Kernel SVM Track

Finally, if possible please include all parameters, rbf-tau, epsilon, SVM-C for all experiments in the result file.

Parallel Track

The recent paradigm shift of learning from single to multi-core shared memory architectures is the focus of the parallel track. For the parallel track methods have to be trained on 8 CPUs following the rules outlined in the wild track. To assess the parallelization quality, we in addition ask participants to train their method using 1,2,4,8 CPUs on the biggest dataset they can process. Note that the evaluation criteria are specifically tuned to parallel shared memory algorithms, i.e. instead of training time on each CPU you should measure wall-clock-time (including data loading time). In addition data loading time must be specified in a separate field.


The raw datasets can be downloaded from ftp://largescale.ml.tu-berlin.de/largescale. The provided script convert.py may be used to obtain a simple (not necessary optimal) feature representation in svm-light format.

Overview of the Raw Datasets

DatasetTrainingValidationDimensionsFormat Description
alpha 500000 100000 500
Files are in ascii format, where each line corresponds to 1 example,

  val1 val2 ... valdim\n
  val1 val2 ... valdim\n
beta 500000 100000 500
gamma 500000 100000 500
delta 500000 100000 500
epsilon 500000 100000 2000
zeta 500000 100000 2000
fd 5469800 532400 900
  Files are binary, to obtain 1 example read 900 or 1156 bytes
  respectively (values 0..255)
ocr 3500000 670000 1156
dna 50000000 1000000 200
Files are ascii, each line contains a string of length 200 (symbols
ACGT), i.e.

webspam 350000 50000 variable
Files contains strings (webpages) separated with 0, i.e.
        html foo bar .../html\0

Submission Format

We provide an evaluation script that parses outputs and computes performance scores. We use this exact same script to do the live evaluation. It is suggested to run this script locally on a subset of the training data to test whether the submission format is correctly generated and to evaluate the results (note that the script can only be used on data where labels are available, e.g. subsets of the training data). It requires python-numpy and scipy to be installed. Additionally if matplotlib is installed, the performance figures will be drawn.

Additionally the data submission format is described below:

Explanation of values (Column by Column)

  1. Dataset size - Values must match size of dataset or 10^[2,3,4,5,6,7]
  2. Index values of 0...9 (to distinguish values obtained while optimizing) only for the biggest dataset, -1 otherwise; for the SVM track -2 for the C experiment and -3 for the rbf-tau experiment
  3. Traintime - Time required for training (without data loading) in seconds / wall-clock time (including data loading) for the parallel track
  4. Testtime - Time required for applying the classifier to the validation/test data (without data loading) in seconds / data loading time for the parallel track
  5. Calibration - Score obtained using the provided calibration tool (values should be the same if run on the same machine)
  6. Method specific value: SVM Objective
  7. SVM-C / number of CPUs for the parallel track
  8. SVM-rbf-tau
  9. SVM epsilon
    • use 0 (for SVM Objective, SVM-C, SVM-rbf-tau, SVM epsilon) if not applicable
    • for SVM please fill in the different parameters, i.e. SVM objective, SVM-C, rbf-tau and stopping condition epsilon, as the meaning of epsilon may differ please explain in the description what epsilon stands for
    • for the linear svm track index -2 should be used when doing the experiment for C
    • for the gaussian svm track index -3 should be used when modifying rbf tau

Assessment of the submissions

Different procedures will be considered depending on the track.

Wild Competition

For the Wild track, the ideal goal would be to determine the best algorithm in terms of learning accuracy, depending on the time budget allowed. Accordingly, the score of a participant is computed as the average rank of its contribution wrt the six scalar measures:

Time vs. Error This figure measures training time vs. area over the precision recall curve (aoPRC). It is obtained by displaying the different time budgets and their corresponding aoPRC on the biggest dataset. We compute the following scores based on that figure:
  • Minimum aoPRC
  • Area under Time vs. aoPRC Curve
  • The time t for which the aoPRC x falls below a threshold (x-overall_minimum_aoPRC)/x<0.05.
Size vs. Error This figure measures dataset size vs. area over the precision recall curve (aoPRC). It is obtained by displaying the different dataset sizes and their corresponding aoPRC that the methods achieve. We compute the following scores based on that figure:
  • Area under Size vs. aoPRC Curve
  • The size s for which the aoPRC x falls below a threshold (x-overall_minimum_aoPRC)/x<0.05.
Size vs Time This figure measures dataset size vs. training time. It is obtained by displaying the different dataset sizes and the corresponding training time that the methods achieve. We compute the following scores based on that figure:
  • Slope of the Curve b using a least squares fit to a*x^b.

SVM Tracks

For the SVM track, the point is to determine the best tradeoff between the computational effort and the learning accuracy. Accordingly, the score of a participant is computed as the average rank of its contribution wrt the five scalar measures

Parallel Tracks

The same figures as in the wild track but showing wall-clock-time will be used for the parallel track. After the end of the competition we will in addition plot time vs. number of cpu's.

Further Questions

For further questions feel free to contact ml-largescale at lists dot tu-berlin dot de