Challenges

Brought to you by our collaborators ...
... with all websites still alive for post-challenge submissions!

Feature selection (NIPS 2003)

Seventy five participants competed on five classification problems to make best predictions and select the smallest possible subset of relevant input variables (features). The tasks include: cancer diagnosis from mass-spectrometry data, handwritten digit recognition, text classification, and drug discovery.

[www] Challenge web site (data available)
[Wsp] Workshop page
[Resu] Result page
[Code] Matlab software and course material
[JMLR] Special issue on feature selection
[Springer] Book edited (+data CD & code)

Performance prediction (WCCI 2006)
and model selection (NIPS 2006)

One hundred and forty-five five participants competed on five classification problems to make best predictions and predict their generalization performance on new unseen data. The tasks include: marketing, drug discovery, text classification, handwritten digit recognition, and ecology. This first challenge was followed by a model selection game using the same datasets, reshuffled, see ALvsPK.

[www] Challenge web site (data available)
[Wsp] WCCI 2006 wshop ; NIPS 2006 wshop
[Resu] Result page
[Code] Matlab software
[JMLR] Special topic on model selection
[CiML] Book edited (free PDF of CiML vol 1)

Agnostic learning vs. prior knowledge, ALvsPK (IJCNN 2007)

This challenge had two tracks: the agnostic learning track and the prior knowledge track, corresponding to two versions of five datasets. The “agnostic track” data was preprocessed in a feature-based representation suitable for off-the-shelf machine learning packages. The “prior knowledge track” had raw data, not always in a feature representation, coming with information about the nature and source of the data. Can you do better with the raw data and prior knowledge about the task? How far can you get with pure “black box learning”?

[www] Challenge web site (data available)
[Wsp] IJCNN 2007 workshop page
[Resu] Results
[Code] Matlab software (CLOP)
[JMLR] Special topic on model selection
[CiML] Book edited (free PDF of CiML vol 1)

Learning causal relationships (WCCI 2008 and NIPS 2008)

What affects your health? What affects the economy? What affects climate changes? and… which actions will have beneficial effects? This series of competitions challenged the participants to discover the causes of given effects, based on observational data. The datasets include re-simulation data from models closely resembling real systems and real data for which the causal dependencies are known from experimental evidence. A first challenge on "causation and prediction" featuring 4 datasets (Genomics, Pharmacology, and Census data) was followed by a "pot-luck challenge" in which the participants exchanged tasks. Fifteen datasets are available to study causal problems.

[www] Challenge web site (data available)
[Wsp] WCCI 2008 wshop; NIPS2008 workshop
[Resu] Results
[Code] Causal explorer (Matlab)
[JMLR] JMLR W&CP proceedings vol 3
JMLR W&CP proceedings vol 6
[CiML] Book edited (free PDF of CiML vol 2)

Fast scoring in a large database (KDD cup 2009)

Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offered the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling). This challenge attracted over 450 participants from 46 countries.

[www] Challenge web site (data available)
[Wsp] KDD cup 2009 workshop page
[Resu] Results
[Code] Matlab software (CLOP)
[JMLR] JMLR W&CP proceedings vol 7
[CiML] Book edited (free PDF of CiML vol 3)

Learning to rank challenge from Yahoo! labs (ICML 2010)

The datasets come from web search ranking and are of a subset of what Yahoo! uses to train its ranking function. They consist of features vectors extracted from query-urls pairs along with relevance judgments. The relevance judgments can take 5 different values from 0 (irrelevant) to 4 (perfectly relevant). The queries, urls and features descriptions are not disclosed, only the feature values. The challenge, which ran from March 1 to May 31, attracted a very large participation with 4,736 submissions coming from 1,055 teams.

[www] Challenge web site (data available)
[Wsp] ICML 2010 workshop page
[Resu] Results
[JMLR] JMLR W&CP proceedings vol 14

Active Learning Challenge (AISTATS 2010 and WCCI 2010)

Labeling data is expensive, but large amounts of unlabeled data are available at low cost. Such problems might be tackled from different angles: learning from unlabeled data or active learning. In the former case, the algorithms must satisfy themselves with the limited amount of labeled data and capitalize on the unlabeled data with semi-supervised learning methods. In the latter case, the algorithms may place a limited number of queries to get labels. The goal in that case is to optimize the queries to label data and the problem is referred to as active learning.

[www] Challenge website (data available)
[Wsp] AISTATS 2010 wsp; WCCI 2010 wsp
[Resu] Results
[Code] Sample Matlab code
[JMLR] JMLR W&CP proceedings vol 16
[CiML] Book edited (free PDF of CiML vol 6)

Unsupervised and Transfer Learning Challenge, UTL
(ICML 2011 and IJCNN 2011)

This challenge addressed a question of fundamental and practical interest in machine learning: the assessment of data representations produced by unsupervised learning procedures, for use in supervised learning tasks. It also addressed the evaluation of transfer learning methods capable of producing data representations useful across many similar supervised learning tasks, after training on supervised data from only one of them.

[www] Challenge web site (data available)
[Wsp] ICML 2100 wshop; IJCNN 2011 wshop
[Resu] Results
[Code] Sample Matlab code
[JMLR] JMLR W&CP proceedings vol 27
[CiML] Book edited (free PDF of CiML vol 7)

Learn the rhythms, predict the musical scores (KDD cup 2011)

Yahoo! Music has amassed billions of user ratings for musical pieces. When properly analyzed, the raw ratings encode information on how songs are grouped, which hidden patterns link various albums, which artists complement each other, and above all, which songs users would like to listen to. The KDD Cup contest released over 300 million ratings performed by over 1 million anonymized users of Yahoo! The competition attracted more than 2000 contestants with about 1300 teams reaching the final stage of the competition.

[www] Challenge web site (data available)
[Wsp] KDD cup 2010 workshop
[Resu] Results
[JMLR] JMLR W&CP proceedings vol 18

One-Shot-Learning Gesture Challenge (CVPR 2012 and ICPR 2012)

Humans are capable of recognizing patterns like hand gestures after seeing just one example. Can machines do that too?
We are organizing a challenge on gesture and sign language recognition from video data. We are mostly focusing on hand gestures, although facial expressions may enter into account. Applications include recognizing signals for man-machine communication, translating sign languages for the deaf to hearing people, and computer gaming.

[www] Challenge web site (data available)
[Wsp] CVPR2012 and ICPR 2012
[Resu] Round 1 (login:CVPR2012, password:papers) Round 2. Data report.
[Code] Sample code (Matlab)
[JMLR] Special topic on gesture recognition
[CiML]

Cause-Effect Pairs challenge (IJCNN 2013 and NIPS 2013)

Given samples of pairs of variables {A, B}, find whether A is a cause of B.
Consider for instance a target variable B, like occurence of "lung cancer" in patients. The goal would be to find whether a factor A, like "smoking", might cause B. The objective of the challenge is to rank pairs of variables {A, B} to prioritize experimental verifications of the conjecture that A causes B.

[www] Challenge web site (data available)
[Wsp] IJCNN 2013 and NIPS 2013
[Resu] Results
[Code] Sample code (Python). Winner1: ProtoML. Winner2: Jarfo. Winner3: FirFID.
[JMLR]
[CiML]

Multi-Modal Gesture Challenge (ICMI 2013)

Gestures accompany speech, can they help improving speech recognition?
Kinect is revolutionizing the field of gesture recognition given the set of input data modalities it provides, including RGB image, depth image (using an infrared sensor), and audio. Gesture recognition is genuinely important in many multi-modal interaction and computer vision applications, including image/video indexing, video surveillance, computer interfaces, and gaming. It also provides excellent benchmarks for algorithms.

[www] Challenge web site (data available)
[Wsp] ICMI 2013
[Resu] Results
[Code] Sample code (Matlab)
[JMLR] Special topic on gesture recognition
[CiML]

Neural Connectomics Challenge (WCCI 2014, ECML 2014)

Discover the structure of a neural network from fluorescence imaging of the neural activity. Recovering the exact wiring of the brain (connectome) including nearly 100 billion neurons, having on average 7000 synaptic connections to other neurons, is a daunting task. Using neuro imaging techniques and methods of network reconstruction, including causal discovery algorithms promises to greatly help neuroanatomy research.

[www] Challenge web site
[Wsp] ECML 2014
[Resu] Draft paper
[Code] Sample code
[JMLR]
[CiML]

ChaLearn Looking at People (ECCV 2014)

Three tracks of challenging computer vision tasks promising to advance how machines look at people:
Track 1: Human Pose Recovery.
Track 2: Action/Interaction Recognition
Track 3: Gesture Recognition.

[www] Challenge web site
[Wsp] ECCV 2014
[Resu] Results
[Code] Data and sample code
[JMLR]
[CiML]

ChaLearn Fast Causation Coefficient (MS Faculty Summit 2014)

Similar to the cause-effect pairs challenge, but this time, you get to submit code to the challenge platform. Your challenge is to build a fast causation coefficient. The proceedings are shared with the cause-effect paris challenge.

[www] Challenge web site
[Wsp] Microsoft Faculty Summit 2014
[Resu] Slides
[Code] Directly on platform!

Higgs Boson Challenge (NIPS 2014)

The ATLAS experiment has recently observed a signal of the Higgs boson decaying into two tau particles, but this decay is a small signal buried in background noise.
The goal of the Higgs Boson Machine Learning Challenge is to explore the potential of advanced machine learning methods to improve the discovery significance of the experiment.

[www] Challenge web site
[Wsp] Workshop at NIPS 2014
[Resu] 
[Code] Directly on platform!
[Data] Released from CERN!

AutoML challenge (IJCNN 2015-2016)

The goal of the AutoML challenge is to create a machine capable of learning from examples without any human intervention. This challenge is concerned with regression and classification problems (binary, multi-class, or multi-label) from data already formatted in fixed-length feature-vector representations. The domains include biology and medicine, ecology, energy and sustainability management, image, text, audio, speech, video and other sensor data processing, internet social media management and advertising, market analysis and financial prediction.

[www] Challenge web site
[Wsp] Workshop at NIPS 2014
[Resu] Book chapter
[Code] Directly on platform!
[Data] From AutoML website

ChaLearn Looking at People (LAP challenges)
Check the list of LAP challenges since 2015 on ChaHub

Looking at People (LAP) is a challenging area of research that deals with the problem of recognizing people in images, detecting and describing body parts, inferring their spatial configuration, performing action/gesture recognition from still images or image sequences, also considering multi-modal data, among others. Any scenario where the visual or multi-modal analysis of people takes a is of interest within the field of Looking at People.

[www] Challenge web site
[Wsp] Workshops
[Data] Datasets

ChaLearn AutoML challenge series
Check the list of AutoML challenges since 2014 on ChaHub

The AutoML track works since 2014 to stimulate the community to work on the problem of creating ML algorithms that work without any human intervention. This means completely automatically choosing models, architectures, hyper-parameters, etc. There are statistical challenges (not over-fitting) and computational challenges (searching fast a large space of possibilities).

[www] Challenge web site
[Data] From AutoML website

ChaLearn Physics challenge series
Check the list of Physics challenges since 2014 on ChaHub

The Higgs Boson challenge gave the start of a series of challenges in High Energy Physics.
To explore what our universe is made of, scientists at CERN are colliding protons, essentially recreating mini big bangs, and meticulously observing these collisions with intricate silicon detectors. While orchestrating the collisions and observations is already a massive scientific accomplishment, analyzing the enormous amounts of data produced from the experiments is becoming an overwhelming challenge. So far we co-organized challenges to detect and characterize new particles like the Higgs Boson and to help trace particle trajectories in the Large Hadron Collider (LHC).

Other resources:
Search on CHAHUB or Google data search or search challenge-related websites by typing keywords in the field below:

Data mining competitions:
A list of data mining competitions maintained by KDnuggets, including the well known KDD cup.

Platforms hosting/posting challenges:
Kaggle: The most popular hosting platform.
Tunedit: Similar platform more academically oriented (phased out?).
DrivenData: For non-profit challenges.
Codalab: For academic challenges of greater complexity.
Beat: A EU sponsored platform.
Epidemium: challenges in epidemiology.
Pascal challenges: The Pascal network is sponsoring several challenges in Machine learning.
Challenges.gov: Challenges sponsored by the US Government.
Ecole Normale Superieure: Datasets and challenges.
Beaker notebook: Convert back and forth from R/Python/Javascript
Cortana Intelligence: Azure ML platform.
RAMP studio: The Paris-Saclay CDS Rapid Analytics Model Prototyping platform.
Synapse: The platform on which DREAM challenges are organized.

Collaborative platforms:
OpenML: share ML reusable frameworks.
MLcomp: compare machine learning programs.
E-lico: data mining portal.
H20: open source predictive analytics platform.
KNIME: Data mining platform.
Quantopian: Financial data simulator + ML tutorials.

Crowdsourcing:
Amazon Mechanical Turk: Gets you hire people from all around the world to solve your tasks. Used to label computer vision data.
Figure eight: Hire people to collect, filter and enhance data.

International conferences hosting challenges:
WCCI: World congress on computational intelligence.
ICDAR: International Conference on Document Analysis and Recognition, a bi-annual conference proposing a contest in printed text recognition. Feature extraction/selection is a key component to win such a contest.
ICPR: In conjunction with the International Conference on Pattern Recognition, ICPR 2004, a face recognition contest is being organized.
ICMI: Competitions on multimodal interaction

Popular challenges:
NNGC: Neural Network Grand Challenge in time series forecasting.
Netflix: The 1 million dollar Netflix prize, which attracted a lot of attention and broke new grounds for recommender systems.
Robocup: Robots who play soccer, a yearly held contest.
DELVE: A platform developed at University of Torontoto benchmark machine learning algorithms.
CAMDA: Critical Assessment of Microarray Data Analysis, an annual conference on gene expression microarray data analysis. This conference includes a context with emphasis on gene selection, a special case of feature selection.
TREC: Text Retrieval conference, organized every year by NIST. The conference is organized around the result of a competition. Past winners have had to address feature extraction/selection effectively.
CASP: An important competition in protein structure prediction called Critical Assessment of Techniques for Protein Structure Prediction.
ICAPS competitions: Competitions in planning and knowledge engineering
MEDIAEVAL benchmarks: Benchmarking Initiative for Multimedia Evaluation. Data sharing in multimediacommons (with incremental annotations). Uses Amazon web services to allow experimentation in the cloud.
DREAM: Dialogue for Reverse Engineering Assessments and Methods. Challenges in gene network reconstruction.
AVEC: Audio visual Emotion Recognition Challenge and Workshop.
CAFA: Predicting function of biological macromolecules (as well as gene-disease associations).

Data resources:
Computer vision datasets
UCI machine learning repository: A great collection of datasets for machine learning research.
KEEL: Knowledge Extraction based on Evolutionary Learning.
Amazon datasets: Public datasets hosted by Amazon.
IO Data Science: Datasets of Paris-Saclay University.
Archive.org: Free books, movies, software, music, websites, and more.

Brought to you by our collaborators ... ... with all websites still alive for post-challenge submissions!