Giter Site home page Giter Site logo

prerpcessing-pannuke-nuclei-instance-segmentation-dataset's Introduction

Hits

Pre-process PanNuke Dataset for Nuclei Instance Segmentation and Classification

PanNuke is an H&E stained image set, containing 7,904 256 × 256 patches from a total of 19 different tissue types. The nuclei are classified into neoplastic, inflammatory, connective/soft tissue, dead, and epithelial cells. The dataset is divided into three folds where;

  • Fold 1 contains 2,657 images
  • Fold 2 contains 2,524 images
  • Fold 3 contains 2,723 images

More info here

You can download the dataset form here

Access the paper TSFD and original

Sample image form original repo

alt text

Three different dataset splits are then made based on these three folds. Tow folds of data are used for training/validation and one for testing set.

Preporcessing

After downloading the data you will get 3 .zip files namely fold1, fold2 and fold3. The data is stored as numpy arrays. After extracting directory structure is as follows,

📦Fold 1
 ┣ 📂images
 ┃ ┗ 📂fold1
 ┃ ┃ ┣ 📜images.npy
 ┃ ┃ ┗ 📜types.npy
 ┣ 📂masks
 ┃ ┣ 📂fold1
 ┃ ┃ ┗ 📜masks.npy
 ┃ ┣ 📜by-nc-sa.md
 ┃ ┗ 📜README.md
 ┗ 📜README.md
# Fold 2 and 3 also have similar structure

Method 1

If you want to use the official splits than run the process_pannuke_std.py script form scripts dir. Just specify input output paths as follows

data_dir = '../PanNuke/data/' # location to extracted folds
output_dir = '../Folds/' # location to save op data 

this script will create a dir containing the three folds. the data will be converted form .npy and saved in .png format. Each fold will be structured as follows

📦Fold 1
 ┣ 📂images
 ┣ 📂inst_masks
 ┗ 📂sem_masks

inside each dir the files will be named as follows, respectively;

img_Colon_2_01594.png

inst_Adrenal_gland_2_01041.png

sem_Bile-duct_2_01420.png
  • The first word inst means this mask contains instance information in form of boundaries
    1. sem means its semantic mask
    2. img means its H&E image
  • Next word is the Adrenal_gland is the tissue type
  • The nexr 2 represents this image is form 2nd fold of original dataset
  • The last number 01041 represents the iamge number

Method 2

In this method I split the dataset tissue wise. So the data will be saved in 19 directories depending upon the tissue. To generate this data run the process_pannuke.py script form scripts dir. Just specify input output paths as follows

data_dir = '../PanNuke/data/' # location to extracted folds
output_dir = '../processed/' # location to save op data 

this script will create a dir containing the 19 sub-dir as follows

📦processed2
 ┣ 📂Adrenal_gland
 ┣ 📂Bile-duct
 ┣ 📂Bladder
 ┣ 📂Breast
 ┣ 📂Cervix
 ┣ 📂Colon
 ┣ 📂Esophagus
 ┣ 📂HeadNeck
 ┣ 📂Kidney
 ┣ 📂Liver
 ┣ 📂Lung
 ┣ 📂Ovarian
 ┣ 📂Pancreatic
 ┣ 📂Prostate
 ┣ 📂Skin
 ┣ 📂Stomach
 ┣ 📂Testis
 ┣ 📂Thyroid
 ┗ 📂Uterus

and the data will be converted form .npy and saved in .png format. Each tissue dir will be structured as follows;

 📂Uterus
 ┃ ┣ 📂images
 ┃ ┣ 📂inst_masks
 ┃ ┗ 📂sem_masks

The nomenclature of files is same as in Method 1.

Custom Splits

Now the data is saved in dir now you can split dataset into train, val and test splits. For that run the spliy_pannuke.pyscript formscriptsdir. specify the input/output directories and the split ratio i.e. how much of the data would like to use forvalandtest`.

op_dir = '../splits/' # output dir for splits
data_dir = '../processed/' # dir containing the tissue wise splits Method 2


test_split = 0.20 # 20% of total data
val_split = 0.1   # 10% of total data

now the op_dir will have following structure

📦splits
 ┣ 📂test
 ┃ ┣ 📂images
 ┃ ┣ 📂inst_masks
 ┃ ┗ 📂sem_masks
 ┣ 📂train
 ┃ ┣ 📂images
 ┃ ┣ 📂inst_masks
 ┃ ┗ 📂sem_masks
 ┗ 📂val
 ┃ ┣ 📂images
 ┃ ┣ 📂inst_masks
 ┃ ┗ 📂sem_masks

Note: This script will split the data tissue wise i.e. 10% of images from each tissue type will be used for val and 20% for test.

If you run the script wiht above values then the splits are as follows;

Total Images Found in Adrenal_gland  = 437
========================================
Training Images   = 314
Testing Images    = 88
Validation Images = 35
////////////////////////////////////////
Total Images Found in Bile-duct  = 420
========================================
Training Images   = 302
Testing Images    = 84
Validation Images = 34
////////////////////////////////////////
Total Images Found in Bladder  = 146
========================================
Training Images   = 104
Testing Images    = 30
Validation Images = 12
////////////////////////////////////////
Total Images Found in Breast  = 2351
========================================
Training Images   = 1692
Testing Images    = 471
Validation Images = 188
////////////////////////////////////////
Total Images Found in Cervix  = 293
========================================
Training Images   = 210
Testing Images    = 59
Validation Images = 24
////////////////////////////////////////
Total Images Found in Colon  = 1440
========================================
Training Images   = 1036
Testing Images    = 288
Validation Images = 116
////////////////////////////////////////
Total Images Found in Esophagus  = 424
========================================
Training Images   = 305
Testing Images    = 85
Validation Images = 34
////////////////////////////////////////
Total Images Found in HeadNeck  = 384
========================================
Training Images   = 276
Testing Images    = 77
Validation Images = 31
////////////////////////////////////////
Total Images Found in Kidney  = 134
========================================
Training Images   = 96
Testing Images    = 27
Validation Images = 11
////////////////////////////////////////
Total Images Found in Liver  = 224
========================================
Training Images   = 161
Testing Images    = 45
Validation Images = 18
////////////////////////////////////////
Total Images Found in Lung  = 184
========================================
Training Images   = 132
Testing Images    = 37
Validation Images = 15
////////////////////////////////////////
Total Images Found in Ovarian  = 146
========================================
Training Images   = 104
Testing Images    = 30
Validation Images = 12
////////////////////////////////////////
Total Images Found in Pancreatic  = 195
========================================
Training Images   = 140
Testing Images    = 39
Validation Images = 16
////////////////////////////////////////
Total Images Found in Prostate  = 182
========================================
Training Images   = 130
Testing Images    = 37
Validation Images = 15
////////////////////////////////////////
Total Images Found in Skin  = 187
========================================
Training Images   = 134
Testing Images    = 38
Validation Images = 15
////////////////////////////////////////
Total Images Found in Stomach  = 146
========================================
Training Images   = 104
Testing Images    = 30
Validation Images = 12
////////////////////////////////////////
Total Images Found in Testis  = 196
========================================
Training Images   = 140
Testing Images    = 40
Validation Images = 16
////////////////////////////////////////
Total Images Found in Thyroid  = 226
========================================
Training Images   = 162
Testing Images    = 46
Validation Images = 18
////////////////////////////////////////
Total Images Found in Uterus  = 186
========================================
Training Images   = 133
Testing Images    = 38
Validation Images = 15
////////////////////////////////////////

prerpcessing-pannuke-nuclei-instance-segmentation-dataset's People

Contributors

mr-talhailyas avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.