We have received a request to add a parallel track to the challenge, which we now implemented. The required changes to the instructions are hi-lighted in red.
General
The overall goal is to develop a 2-class classifier such that it achieves a low error in the shortest possible time using as few datapoints as possible.
|
Challenge Tasks
The overall goal is to tune your method such that achieves a low test error measured by the area over the precision recall curve in the shortest possible time and using as few datapoints as possible.
Wild Competition TrackIn this track you are free to do anything that leads to more efficient, more accurate methods, e.g. perform feature selection, find effective data representations, use efficient program-code, tune the core algorithm etc. It may be beneficial to use the raw data representation (described below). For each dataset your method competes in you are requested to:
|
SVM TracksTo allow for a comparison w.r.t. to non-SVM solvers, all SVM methods are required to compete in both tracks and therefore for SVMs step 1 is to attack the wild-track task. In addition to what has to be done in the wild track we ask you to do the following experiments.
|
Parallel TrackThe recent paradigm shift of learning from single to multi-core shared memory architectures is the focus of the parallel track. For the parallel track methods have to be trained on 8 CPUs following the rules outlined in the wild track. To assess the parallelization quality, we in addition ask participants to train their method using 1,2,4,8 CPUs on the biggest dataset they can process. Note that the evaluation criteria are specifically tuned to parallel shared memory algorithms, i.e. instead of training time on each CPU you should measure wall-clock-time (including data loading time). In addition data loading time must be specified in a separate field. |
Datasets
The raw datasets can be downloaded from ftp://largescale.ml.tu-berlin.de/largescale. The provided script convert.py may be used to obtain a simple (not necessary optimal) feature representation in svm-light format.
Overview of the Raw Datasets | ||||
|---|---|---|---|---|
| Dataset | Training | Validation | Dimensions | Format Description |
| alpha | 500000 | 100000 | 500 |
Files are in ascii format, where each line corresponds to 1 example, i.e. val1 val2 ... valdim\n val1 val2 ... valdim\n ... |
| beta | 500000 | 100000 | 500 | |
| gamma | 500000 | 100000 | 500 | |
| delta | 500000 | 100000 | 500 | |
| epsilon | 500000 | 100000 | 2000 | |
| zeta | 500000 | 100000 | 2000 | |
| fd | 5469800 | 532400 | 900 |
Files are binary, to obtain 1 example read 900 or 1156 bytes respectively (values 0..255) |
| ocr | 3500000 | 670000 | 1156 | |
| dna | 50000000 | 1000000 | 200 |
Files are ascii, each line contains a string of length 200 (symbols ACGT), i.e. CATCATCGGTCAGTCGATCGAGCATC...A\n GTGTCATCGTATCGACTGTCAGCATC...T\n ... |
| webspam | 350000 | 50000 | variable |
Files contains strings (webpages) separated with 0, i.e.
html foo bar .../html\0
|
Submission Format
We provide an evaluation script that parses outputs and computes performance scores. We use this exact same script to do the live evaluation. It is suggested to run this script locally on a subset of the training data to test whether the submission format is correctly generated and to evaluate the results (note that the script can only be used on data where labels are available, e.g. subsets of the training data). It requires python-numpy and scipy to be installed. Additionally if matplotlib is installed, the performance figures will be drawn.
Additionally the data submission format is described below:
- Wild Competition (download example submission)
dataset_size0 -1 traintime0 testtime0 calibration0 objective C epsilon rfb-tau output0 output1 ... ... dataset_sizeN -1 traintimeN testtimeN calibrationN objective C epsilon rfb-tau output0 output1 ... dataset_sizeN index0 traintime0 testtime0 calibration0 objective C epsilon rfb-tau output0 output1 ... ... dataset_sizeN index9 traintime9 testtime0 calibration9 objective C epsilon rfb-tau output0 output1 ... - SVM Tracks (download example submission)
A submission to the SVM track requires to take part in the wild competition, i.e. the submission file must start with the lines required for the wild competition. Then a single empty line announces the SVM-Track specific data:
>a singly empty line here distinguishes wild from model specific track dataset_size0 -1 traintime0 testtime0 calibration0 objective C epsilon rfb-tau ... dataset_sizeN -1 traintimeN testtimeN calibrationN objective C epsilon rfb-tau dataset_sizeN index1 traintime1 testtime1 calibration1 objective C epsilon rfb-tau ... dataset_sizeN index9 traintime9 testtime0 calibration9 objective C epsilon rfb-tau dataset_sizeN -2 traintime0 testtime0 calibration0 objective C1 epsilon rfb-tau ... dataset_sizeN -2 traintimeK testtimeK calibration9 objective CK epsilon rfb-tau (or -3 and rbf-tau 1 ... K) - Parallel Track
A submission to the parallel track consists of two parts: The first one has the exact same syntax as the wild track (with time = wall-clock-time). The second part contains the 1,2,4,8 CPU experiment run on the biggest dataset (here C denotes the number of CPUs)
>a singly empty line here distinguishes wild from model specific track
dataset_sizeN -4 walltimeN dataloadingtimeN calibrationN objective 1 epsilon rfb-tau
dataset_sizeN -4 walltimeN dataloadingtimeN calibrationN objective 2 epsilon rfb-tau
dataset_sizeN -4 walltimeN dataloadingtimeN calibrationN objective 4 epsilon rfb-tau
dataset_sizeN -4 walltimeN dataloadingtimeN calibrationN objective 8 epsilon rfb-tau
Explanation of values (Column by Column)
- Dataset size - Values must match size of dataset or 10^[2,3,4,5,6,7]
- Index values of 0...9 (to distinguish values obtained while optimizing) only for the biggest dataset, -1 otherwise; for the SVM track -2 for the C experiment and -3 for the rbf-tau experiment
- Traintime - Time required for training (without data loading) in seconds / wall-clock time (including data loading) for the parallel track
- Testtime - Time required for applying the classifier to the validation/test data (without data loading) in seconds / data loading time for the parallel track
- Calibration - Score obtained using the provided calibration tool (values should be the same if run on the same machine)
- Method specific value: SVM Objective
- SVM-C / number of CPUs for the parallel track
- SVM-rbf-tau
- SVM epsilon
- use 0 (for SVM Objective, SVM-C, SVM-rbf-tau, SVM epsilon) if not applicable
- for SVM please fill in the different parameters, i.e. SVM objective, SVM-C, rbf-tau and stopping condition epsilon, as the meaning of epsilon may differ please explain in the description what epsilon stands for
- for the linear svm track index -2 should be used when doing the experiment for C
- for the gaussian svm track index -3 should be used when modifying rbf tau
- The following values in columns 10-... must match the size of the validation/test set.
Assessment of the submissions
Different procedures will be considered depending on the track.
Wild Competition
For the Wild track, the ideal goal would be to determine the best algorithm in terms of learning accuracy, depending on the time budget allowed. Accordingly, the score of a participant is computed as the average rank of its contribution wrt the six scalar measures:
|
This figure measures training time vs. area over the precision recall curve (aoPRC). It is obtained by displaying the different time budgets and their corresponding aoPRC on the biggest dataset. We compute the following scores based on that figure:
|
|
This figure measures dataset size vs. area over the precision recall curve (aoPRC). It is obtained by displaying the different dataset sizes and their corresponding aoPRC that the methods achieve. We compute the following scores based on that figure:
|
|
This figure measures dataset size vs. training time. It is obtained by displaying the different dataset sizes and the corresponding training time that the methods achieve. We compute the following scores based on that figure:
|
SVM Tracks
For the SVM track, the point is to determine the best tradeoff between the computational effort and the learning accuracy. Accordingly, the score of a participant is computed as the average rank of its contribution wrt the five scalar measures
- Minimal objective
- Area under the Time vs. Objective Curve
- Time to reach objective within 5% tolerance, i.e. minimal t for (t,obj) with (obj-overal_min_objective)/obj<0.05
- Average Training Time for all C/Sigma
- Computational Effort (scaling with dataset size)
Parallel Tracks
The same figures as in the wild track but showing wall-clock-time will be used for the parallel track. After the end of the competition we will in addition plot time vs. number of cpu's.
Further Questions
For further questions feel free to contact ml-largescale at lists dot tu-berlin dot de
.
