Medical Segmentation Decathlon

Generalisable 3D Semantic Segmentation


- The Decathlon challenge paper is now on Nature Communications and ArXiv

- The Decathlon dataset specific paper is also on

- New rolling competition and leaderboard is now available here


With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validation and testing covering a large span of challenges, such as: small data, unbalanced labels, large-ranging object scales, multi-class labels, and multimodal imaging, etc. This challenge and dataset aims to provide such resource thorugh the open sourcing of large medical imaging datasets on several highly different tasks, and by standardising the analysis and validation process. 



All data will be made available online with a permissive copyright-license (CC-BY-SA 4.0), allowing for data to be shared, distributed and improved upon. All data has been labeled and verified by an expert human rater, and with the best effort to mimic the accuracy required for clinical use. To cite this data, please refer to

Schedule & Guidelines

The MSD challenge tests the generalisability of machine learning algorithms when applied to 10 different semantic segmentation tasks. The aim is to develop an algorithm or learning system that can solve each task, separateley, without human interaction. This can be acheived through the use of a single learner, an ensable of multiple learners, architecture search, curriculum learning, or any other technique, as long as task-specific model parameters are not human-defined.

Participants are expected to download the data, develop a general purpose learning algorithm, train the algorithm on each task training data independently without human interaction (no task-specific manual parameter settings), run the learned model on the test data, and submit the segmentation results. 



Liver Tumours

Target: Liver and tumour
Modality: Portal venous phase CT
Size: 201 3D volumes (131 Training + 70 Testing)
Source: IRCAD Hôpitaux Universitaires
Challenge: Label unbalance with a large (liver) and small (tumour) target


Brain Tumours

Target: Gliomas segmentation necrotic/active tumour and oedema
Modality: Multimodal multisite MRI data (FLAIR, T1w, T1gd,T2w)
Size: 750 4D volumes (484 Training + 266 Testing)
Source: BRATS 2016 and 2017 datasets.
Challenge: Complex and heterogeneously-located targets



Target: Hippocampus head and body
Modality: Mono-modal MRI 
Size: 394 3D volumes (263 Training + 131 Testing)
Source: Vanderbilt University Medical Center
Challenge: Segmenting two neighbouring small structures with high precision 


Lung Tumours

Target: Lung and tumours
Modality: CT
Size: 96 3D volumes (64 Training + 32 Testing)
Source: The Cancer Imaging Archive
Challenge: Segmentation of a small target (cancer) in a large image



Target: Prostate central gland and peripheral zone 
Modality: Multimodal MR (T2, ADC)
Size: 48 4D volumes (32 Training + 16 Testing)
Source: Radboud University, Nijmegen Medical Centre
Challenge: Segmenting two adjoint regions with large inter-subject variations



Target: Left Atrium
Modality: Mono-modal MRI  
Size: 30 3D volumes (20 Training + 10 Testing)
Source: King’s College London
Challenge: Small training dataset with large variability


Pancreas Tumour

Target: Liver and tumour
Modality: Portal venous phase CT
Size: 420 3D volumes (282 Training +139 Testing)
Source: Memorial Sloan Kettering Cancer Center
Challenge: Label unbalance with large (background), medium (pancreas) and small (tumour) structures. 


Colon Cancer

Target: Colon Cancer Primaries
Modality: CT  
Size: 190 3D volumes (126 Training + 64 Testing)
Source: Memorial Sloan Kettering Cancer Center
Challenge: Heterogeneous appearance


Hepatic Vessels

Target: Hepatic vessels and tumour
Modality: CT
Size: 443 3D volumes (303 Training + 140 Testing)
Source: Memorial Sloan Kettering Cancer Center
Challenge: Tubular small structures next to heterogeneous tumour. 



Target: Spleen
Modality: CT  
Size: 61 3D volumes (41 Training + 20 Testing)
Source: Memorial Sloan Kettering Cancer Center
Challenge: Large ranging foreground size

Assessment Criteria

The challenge aims to optimise algorithms for generalisability and not necessarily attempting to achieve state-of-the-art performance on all 10 tasks. Thus, a small subset of classical semantic segmentation metrics, in this case the Dice Score (DSC) and a Normalised Surface Distance (NSD), will be used to assess different aspects of the performance of each task and region of interest. These metrics are implemented here.  These different metrics will be combined to provide a single task performance metric, and then meta-analysed to obtain an overall performance metric. Further details on the statistical analysis methodology is described here.

Due to the complexity of the challenge, the metrics (DSC,NSD) were chosen purely due to their well-known behaviour, their popularity, and their rank stability [see]. Having simple and rank-stable metrics also allows the statistical comparison between methods. Note that the proposed metrics are not task-specific and not task-optimal, and thus, they do not fulfill the necessary clinical criteria for algorithmic validation of each independent task. 

Organising Team and Data Contributors


M. Jorge

King's College London

Role: Leadership, Conceptual Design, Data Pre-Processing, Stats and Metrics Committee



Memorial Sloan Kettering
Cancer Center

Role: Data Donation (Pancreas, Colon Cancer, Hepatic Vessels, Spleen), Conceptual Design


Olaf Ronneberger 

Google Deepmind

Role: Conceptual Design, Metrics Committee



Technische Universität München

Role: Data Donation (Brain Tumours, Liver Tumours), Stats and Metrics Committee


van Ginneken

Radboud University Medical Center

Role: Data Donation (Prostate), Conceptual Design



Vanderbilt University

Role: Data Donation (Hippocampus), Conceptual Design, Metrics Committee



Radboud University Medical Center

Role: Data Donation (Prostate)



National Institutes of Health

Role: Data Donation (Lung Tumours)



National Institutes of Health Clinical Center

Role: Conceptual Design



DKFZ German Cancer Research Center

Role: Conceptual Design, Statistical Analysis Committee



DKFZ German Cancer Research Center

Role: Conceptual Design, Statistical Analysis Committee



CBICA, University of Pennsylvania

Role: Data Donation (Brain Tumours)



University College London

Role: Conceptual Design, Data Preparation, Validation System Setup, Challenge Day-to-day Support

 Frequently Asked Questions

Bellow are some of the most asqued questions received over email or other contact media. 

Who should I contact if I have any questions?

You should contact the organisers by emailing

Can I change the learning rate or any other parameter (regularisation, depth) per task?

The ultimate aim of the challenge is for the winning algorithm to “just work” on any future unseen task, without requiring any extra inputs or human interaction. Thus, the learning algorithm/script/parameters cannot be changed manually per task. Parameters can only be made task specific if the parameter is found automatically within the learning system itself (e.g. through cross-validation).

Is it permitted to train with additional datasets that are not part of the challenge?

Using task-specific external data to pre-train the nework would be considered a task-specific change and is thus not allowed. You can use a general purpose pre-trained network to initialise all the tasks, but not a pre-trained network which is task specific. In short, you can do anything methodologically as long as it is not task-specific and would work on any future new task.

Are we supposed to submit the segmentation masks or the code to the challenge?

Only segmentation masks are required. We incentivise that people make the code available, but this is not a requirement.

Is the validation process handled as a black box, such as a virtual machine?

Yes, a docker container that will do all the validation on the server. We will release the validation scripts for public scrutiny, but not the validation labels.

Do I need to identify myself when submitting results?

There are not requirements of identifiability for submissions. You will however have to identify the participants names and institutions if you want to be part of a future journal submission describing the outcomes of the challenge.