ICML'08 Workshop PASCAL Large Scale Learning Challenge -- July 9, 2008
Topics: Large scale learning; Bounded-resource learning.
Motivation
With the exceptional increase in computing power, storage capacity and network bandwidth of the past decades, ever growing datasets are collected in fields such as bioinformatics (Splice Sites, Gene Boundaries, etc), IT-security (Network traffic) or Text-Classification (Spam vs. Non-Spam), to name but a few. While the data size growth leaves computational methods as the only viable way of dealing with data, it poses new challenges to ML methods.
This workshop is concerned with the scalability and efficiency of existing ML approaches with respect to computational, memory or communication resources, e.g. resulting from a high algorithmic complexity, from the size or dimensionality of the data set, and from the trade-off between distributed resolution and communication costs.
Indeed many comparisons are presented in the literature; however, these usually focus on assessing a few algorithms, or considering a few datasets; further, they most usually involve different evaluation criteria, model parameters and stopping conditions. As a result it is difficult to determine how does a method behave and compare with the other ones in terms of test error, training time and memory requirements, which are the practically relevant criteria.
In the context of the Pascal (Pattern Analysis, Statistical Modelling and Computational Learning) European Network of Excellence, a Challenge is organized to enable a fair and principled assessment of existing large scale classifiers (http://largescale.first.fraunhofer.de).
The Large Scale Learning Workshop at ICML will serve to disseminate the challenge results and announce the winners of the competition. Authors of the best and most original contributions will present their work. Furthermore a panel discussion will be devoted to establishing a principled framework for the validation of large scale learning methods.
Workshop Program (Workshop Day is July 9, 2008; location S14, 3rd floor)
Morning Session:
| 08:30 - 09:15 | Welcome and Presentation of Results (Organizers) | slides | |
| 09:15 - 10:00 | Ronan Collobert - Large Scale Learning Which Is Actually Useful | slides | |
| 10:00 - 10:15 | Coffee Break | ||
| 10:15 - 10:35 | Jochen Garcke - AV SVM | abstract | slides |
| 10:35 - 11:05 | Hsiang Fy Yu - liblinear | abstract | slides |
| 11:05 - 11:35 | Yossi Richter - Parallel Decision Tree | abstract | slides |
Afternoon Session
| 14:00 - 14:30 | Han-Shen Huang and Chun-Nan Hsu - Triple Jump Linear SVM | abstract | slides |
| 14:30 - 15:00 | Marc Boulle - Averaging of Selective Naive Bayes Classifiers | abstract | slides |
| 15:00 - 15:45 | Chih-Jen Lin - Training Support Vector Machines: Status and Challenges | slides | |
| 15:45 - 16:00 | Coffee Break | ||
| 16:00 - 16:03 | Kristian Woodsend - Interior Point SVM (presented by Soeren Sonnenburg) | abstract | slides |
| 16:03 - 16:30 | Olivier Chapelle, Sathiya Keerthi - SDM SVM L1/2 and Newton SVM (presented by Chih-Jen Lin) | abstract, abstract | slides |
| 16:30 - 17:00 | Antoine Bordes - SGD-QN, LaRank | abstract, abstract | slides |
| 17:00 - 18:00 | Discussion and Summary | slides | |
Bold - PASCAL invited speaker
Organizers
- Soeren Sonnenburg, TU Berlin, Berlin, Germany
- Vojtech Franc, Czech Technical University, Prague, Czech Republic
- Elad Yom-Tov, IBM Haifa Research Lab, Haifa, Israel
- Michele Sebag, LRI, Orsay, France
