Neuron Type Classification

Author, Tielin Zhang, Yue Zhang, Likai Tang in CASIA. [email protected], http://bii.ia.ac.cn/~tielin.zhang

data introduction

the structure of swc-format file.

 0 [0.0, 0.0, 0.0] 1 -1 7.64 0 0.0 0.0  
 1 [6.54, 3.93, 0.0] 1 0 7.64 -1 7.63 0.0  
 2 [-6.54, -3.93, 0.0] 1 0 7.64 -1 7.63 0.0  
 3 [-4.89, 11.54, -0.27] 4 0 1.09 1 12.54 71.13  
 4 [2.08, -13.49, 6.19] 3 0 0.54 1 14.99 66.08  
 ……

the structure of each line for swc file:

 ['id', 'P', 'type', 'parent', 'width', 'branch_level', 'path_length', 'degree']，where,  
id: ID of the node
P：the 3d position, the soma is the (0,0,0)
type: 1-core, 2-axon, 3-the end of the dendritic, 4-apical dendrite  
parent: father id of point
width: the diameter of the neuron
branch_level：the level of node，-1-end point  
path_length：the distance from point to the center node
degree：the angle of two branches

In swc_data folder:

swc_v0 raw data that crawler from the site
swc_v1 first version of data repaired by software
swc_v3 final version data repaired by software
swc_fake virtual neuron data generated by software

png type data

In png_data folder:
png_v0 raw neuronal image data that crawler from the site
png_v1 image data generated from swc_v1
png_fake virtual neuron data generated by software
png_r resampled_rat_img, resampled images of neurons, the treated neurons look simpler
png_fc fixed_coordinate_rat_img, XY align images of neurons

Data catalog format

swc is similar to png's directory format
for example,there are two folders,train, test under swc_v0
There are 5 folders under train, such as principal cell, each containing prinary_cell_class same class of neurons
There are several folders under the principal cell, such as pyrimidal, each containing the same class of neurons secondary_cell_class
There are several folders under pyrimidal, such as Not reported, each containing the same class of neurons teritary_cell_class
There are several swc files under Not reported

Dividing datasets

Put all swc or png files in the same folder, run the scratch.py files, and divide the dataset into train and test sets and the large and sub-classes by modifying the last two path parameters of divide_dataset ('./png_data/v0', './png_data/png_v0', train_id),the first path is the folder before the division, and the second path is the folder after the division.

Operating instructions

Environment

Ubuntu 16.04.6 LTS
python3.5.2
The required python package can be installed by running pip install -r requirements.txt

Structure

There are 4 models and their corresponding input and training codes, including RNN, TRNN, CNN, multi, and are placed in the corresponding directory. Then there's the SVM model.
Each model consists of _input, _model, and _train three code files. (Multi and CNN don't have _model files) and model files that are pre-trained

_input

_input The file contains a Dataset class that is inherited from torch.utils.data.dataset for data reading and pre-processing. _input The problem of sample equilibrium is considered in the list_file function in the file, and the number of samples of all classes is simply expanded to the same size

_model

_model The file includes the code needed to build the model, i.e. a subclass that includes a torch.nn.Module

_train

_train file contains code that trains or tests the data, and the required parameters and data catalogs are placed at the beginning of the file. If you want to use a pretrained model, change the trained parameter to True and change the model_filename parameter to the corresponding model file name

Runing the code

Which model needs to be trained, open the corresponding _train file,comment off the corresponding part according to two or 12 categories and modify the data catalog data_dir and superparameters. Data_dir the format please refer to the original modification. If you use new data, you need to run scratch.py to pre-divide the training set and test set and place neurons by class. When the training is complete, the corresponding model file is automatically generated under the CNN_train folder.

cuda

If there are more than one GPU on the server, use os.environ to specify the GPU. The CNN model uses nn.DataParallel to train at the same time using four gpus. But if there are no multiple blocks of gpu on the server, there may be errors.

Model description

RNN

Use two layers of lstm and add a full-connection layer in the end, more details is in structure.pptx

TRNN

The Models is imported by the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks，
But it's simpler than the model of the original paper. The model consists of two layers of lstmcell, rolled back in the order from leaf node to root node, and full connection layer is in the end. more details is in structure.pptx

CNN

Use pre-trained resnet18 and replace the last full-connection layer to accommodate the number of categories. Training with a smaller learning rate to train the front layers, with a larger learning rate to train the last layer of full connection layer.

Multi

Multi imports pre-trained trnn_model (excluding the last layer of full connection) and cnn_model (excluding the last layer of full connection), i.e. extracting the two-part feature vector and stitch the characteristic vectors of the two parts together, add a layer of full connection. Only this last layer of full connection is trained during training.

SVM

From sql, 19 morphological-related features were selected,and saved in the features file (pickle form, see the structure.pptx), using the svm classification (sklearn. SVC)
The 19 dimensions include:
Total_Length, Number_of_Bifurcations, Fractal_Dimension, Number_of_Stems, Number_of_Branches, Total_Surface, Max_Branch_Order, Average_Rall_s_Ratio, Max_Euclidean_Distance, Partition_Asymmetry, Overall_Depth, Soma_Surface, Overall_Height, Average_Diameter, Overall_Width, Total_Volume, Average_Bifurcation_Angle_Remote, Average_Bifurcation_Angle_Local, Max_Path_Distance

adlily / neuromorpho_neuron_type_classification Goto Github PK

neuromorpho_neuron_type_classification's Introduction