License: CC BY-NC-SA 4.0
This repository contains the UterUS dataset, a collection of 3D ultrasound volumes of the uterus with manual annotations of uterine cavity. It serves as a resource for researchers working on uterine morphology analysis and ultrasound/medical image analysis. It also includes the nnUNet segmentation models trained on the presented dataset.
The dataset consists of three main directories with volumes in Neuroimaging Informatics Technology Initiative (NIfTI-2) format (.nii.gz) located in the dataset directory:
- annotated_volumes: 141 files representing ultrasound volumes of the uterus that have corresponding segmentation annotations
- annotations: 141 segmentation volumes corresponding to the annotated volumes
- volumes_without_annotations: 174 ultrasound volumes without associated annotations
The volumes are named by the anonimized number of the medical center and the number of the sample (e.g. 04_12.nii.gz comes from medical center number 4 and sample number 12). The info.csv file provides information about the volumes, including:
- Ultrasound machine model
- Group classification (G - general population, M - consecutive miscarriages, I - unexplained infertility, RIF - recurrent implantation failure)
While some information is currently missing, the dataset will be regularly updated upon its availability. The majority of volumes currently belong to the general (G) group, as that was the main focus of the initial study. This means that the data is not yet well-suited for classification tasks that require a more diverse set of samples. However, we might be expanding the dataset to include more representative data from all groups.
The UterUS dataset stems from a collaborative research project across several European medical centers under the initiative "Normal UteRine asSEssment" (NURSE) aiming to define the normal uterus. The data originates from female subjects categorized into four groups:
- General population (G): women aged 18-28 who have never been pregnant (to avoid age-related uterine anomalies)
- Unexplained infertility (I): women experiencing at least 6 months of unexplained infertility despite normal semen analysis, tubal patency, and ovulation
- Recurrent miscarriages (M): women with a history of 2 or more spontaneous miscarriages with normal karyotype and absence of antiphospholipid syndrome or thrombophilia
- recurrent implantation failure (RIF): women with a history of 2 or more implantation failures
Exclusion criteria included:
- History of pregnancy
- Presence of acquired uterine abnormalities (e.g., myomas, adenomyosis)
- Endometrial polyps (inclusion permitted after polypectomy)
- Unicornuate uterus
Examinations were conducted during the proliferative phase of the menstrual cycle for all participants.
While most volumes were acquired using General Electric (GE) ultrasound machines, a portion of the data originates from Samsung devices, making the dataset multi-device. GE volumes were converted to the NIfTI format using the Slicer3D extension SlicerHeart. Samsung volumes were converted with a dedicated tool developed by their R&D team.
We thank the gynecologists from the following institutions for providing the data: Gynecology Clinic Gergolet, Alma Mater Studiorum - University of Bologna, University of Naples, Imperial College London, Complutense University of Madrid, Clinica Victoria in Malaga, the Gynecology Clinic at the University of Udine, the Gynecology Clinic at the University of Bari, and the Gynecology Department at the General Hospital of Forli. We also thank Michele Losappio, Marco De Pasquale and Samsung's R&D team for all the help with converting their volumes for our usage.
Using the presented dataset, we trained a nnUNet model for segmentation of the uterine region. You can get the trained models in 2D and 3D from this link.
We used a subset of the UterUS dataset consisting of 141 annotated volumes for training and evaluation. The training-validation split was 80/20, resulting in 113 volumes for training and 28 for validation. The 3d_fullres model achieved an average Dice coefficient of 0.909 ± 0.046 on the validation set, 0.899 ± 0.076 on the test set and 0.863 ± 0.043 on the dataset consisting of data from medical centers not included in the training or testing.
You can run the prediction yourself after setting up nnUNet as presented in their repository and adding the model directories in the nnUNet_results/Datasetxxx_name/ directory.
nnUNetv2_predict -d Datasetxxx_name -i INPUT_FOLDER -o OUTPUT_FOLDER -tr nnUNetTrainer -c 3d_fullres -p nnUNetPlans
or changing 3d_fullres into 2d for running the 2D model prediction (though as expected, it produces worse results).