Medical Segmentation Decathlon

Generalisable 3D Semantic Segmentation


- The data has been updated to V2.0 on the 4th of July. This new version release includes minor corrections incuding the removal of promeatic images (Task01 and Task04), image orientation (Task02 and Task03) , JSON parsing verifications (Task01 and Task05), and minor changes to the labels (Task01). Please re-download the V2 data before submitting any results.

- The submission website is now available. Please click the following link   

- The validation metrics are described in the “
Assessment Criteria” section below.


With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validation and testing covering a large span of challenges, such as: small data, unbalanced labels, large-ranging object scales, multi-class labels, and multimodal imaging, etc. This challenge and dataset aims to provide such resource thorugh the open sourcing of large medical imaging datasets on several highly different tasks, and by standardising the analysis and validation process. 


All data will be made available online with a permissive copyright-license (CC-BY-SA 4.0), allowing for data to be shared, distributed and improved upon. All data has been labeled and verified by an expert human rater, and with the best effort to mimic the accuracy required for clinical use.

Schedule & Guidelines

The MSD challenge tests the generalisability of machine learning algorithms when applied to 10 different semantic segmentation tasks. The aim is to develop an algorithm or learning system that can solve each task, separateley, without human interaction. This can be acheived through the use of a single learner, an ensable of multiple learners, architecture search, curriculum learning, or any other technique, as long as task-specific model parameters are not human-defined.

The challenge will consist of 2 phases.

 Phase 1

Data for 7 tasks will be released on the 11th of May. Participants are expected to download the data, develop a general purpose learning algorithm, train the algorithm on each task training data independently without human interaction (no task-specific manual parameter settings), run the learned model on the test data, and submit the segmentation results by the 5th of August. This phase will test how well the developed learner can solve multiple independent tasks.

 Phase 2

Teams that have submitted to Phase 1 will be given access to 3 more tasks on the 6th of August. They should train their previously developed algorithm, without any software modifications, on the 3 new tasks, and submit results of the last 3 tasks by the 31st August. This phase will test how well the previously developed learner can generalise to unseen tasks.


Liver Tumours

Target: Liver and tumour
Modality: Portal venous phase CT
Size: 201 3D volumes (131 Training + 70 Testing)
Source: IRCAD Hôpitaux Universitaires
Challenge: Label unbalance with a large (liver) and small (tumour) target

Brain Tumours

Target: Gliomas segmentation necrotic/active tumour and oedema
Modality: Multimodal multisite MRI data (FLAIR, T1w, T1gd,T2w)
Size: 750 4D volumes (484 Training + 266 Testing)
Source: BRATS 2016 and 2017 datasets.
Challenge: Complex and heterogeneously-located targets


Target: Hippocampus head and body
Modality: Mono-modal MRI 
Size: 394 3D volumes (263 Training + 131 Testing)
Source: Vanderbilt University Medical Center
Challenge: Segmenting two neighbouring small structures with high precision 

Lung Tumours

Target: Lung and tumours
Modality: CT
Size: 96 3D volumes (64 Training + 32 Testing)
Source: The Cancer Imaging Archive
Challenge: Segmentation of a small target (cancer) in a large image


Target: Prostate central gland and peripheral zone 
Modality: Multimodal MR (T2, ADC)
Size: 48 4D volumes (32 Training + 16 Testing)
Source: Radboud University, Nijmegen Medical Centre
Challenge: Segmenting two adjoint regions with large inter-subject variations

Cardiac Data

Target: Left Atrium
Modality: Mono-modal MRI  
Size: 30 3D volumes (20 Training + 10 Testing)
Source: King’s College London
Challenge: Small training dataset with large variability


Target: Liver and tumour
Modality: Portal venous phase CT
Size: 420 3D volumes (282 Training +139 Testing)
Source: Memorial Sloan Kettering Cancer Center
Challenge: Label unbalance with large (background), medium (pancreas) and small (tumour) structures. 

3 Mystery Tasks

In order to test the generalisability of the models, 3 task will be released after the first submission round (End of July) in order to avoid multitask overfit. 

Assessment Criteria

The challenge aims to optimise algorithms for generalisability and not necessarily attempting to achieve state-of-the-art performance on all 10 tasks. Thus, a small subset of classical semantic segmentation metrics, in this case the Dice Score (DSC) and a Normalised Surface Distance (NSD), will be used to assess different aspects of the performance of each task and region of interest. These different metrics will be combined to provide a single task performance metric, and then meta-analysed to obtain an overall performance metric.

Due to the complexity of the challenge, the metrics (DSC,NSD) were chosen purely due to their well-known behaviour, their popularity, and their rank stability [see]. Having simple and rank-stable metrics also allows the statistical comparison between methods. Note that the proposed metrics are not task-specific and not task-optimal, and thus, they do not fulfill the necessary clinical criteria for algorithmic validation of each independent task.

Further details on the statistical analysis methodology will be published soon. 

Organising Team and Data Contributors

M. Jorge

King's College London


Memorial Sloan Kettering Cancer Center

Olaf Ronneberger 

Google Deepmind


Technische Universität München

van Ginneken

Radboud University Medical Center


Vanderbilt University


Radboud University Medical Center


National Institutes of Health


National Institutes of Health Clinical Center


DKFZ German Cancer Research Center


DKFZ German Cancer Research Center


CBICA, University of Pennsylvania


University College London

 Frequently Asked Questions

Bellow are some of the most asqued questions received over email or other contact media. 

Who should I contact if I have any questions?

You should contact the organisers by emailing

Can I change the learning rate or any other parameter (regularisation, depth) per task?

The ultimate aim of the challenge is for the winning algorithm to “just work” on any future unseen task, without requiring any extra inputs or human interaction. Thus, the learning algorithm/script/parameters cannot be changed manually per task. Parameters can only be made task specific if the parameter is found automatically within the learning system itself (e.g. through cross-validation).

Is it permitted to train with additional datasets that are not part of the challenge?

Using task-specific external data to pre-train the nework would be considered a task-specific change and is thus not allowed. You can use a general purpose pre-trained network to initialise all the tasks, but not a pre-trained network which is task specific. In short, you can do anything methodologically as long as it is not task-specific and would work on any future new task.

Are we supposed to submit the segmentation masks or the code to the challenge?

Only segmentation masks are required. We incentivise that people make the code available, but this is not a requirement.

Is the validation process handled as a black box, such as a virtual machine?

Yes, a docker container that will do all the validation on the server. We will release the validation scripts for public scrutiny, but not the validation labels.

Do I need to identify myself when submitting results?

There are not requirements of identifiability for submissions. You will however have to identify the participants names and institutions if you want to be part of a future journal submission describing the outcomes of the challenge.