Author, Tielin Zhang, Yue Zhang, Likai Tang in CASIA. [email protected], http://bii.ia.ac.cn/~tielin.zhang
0 [0.0, 0.0, 0.0] 1 -1 7.64 0 0.0 0.0
1 [6.54, 3.93, 0.0] 1 0 7.64 -1 7.63 0.0
2 [-6.54, -3.93, 0.0] 1 0 7.64 -1 7.63 0.0
3 [-4.89, 11.54, -0.27] 4 0 1.09 1 12.54 71.13
4 [2.08, -13.49, 6.19] 3 0 0.54 1 14.99 66.08
……
the structure of each line for swc file:
['id', 'P', 'type', 'parent', 'width', 'branch_level', 'path_length', 'degree'],where,
id: ID of the node
P:the 3d position, the soma is the (0,0,0)
type: 1-core, 2-axon, 3-the end of the dendritic, 4-apical dendrite
parent: father id of point
width: the diameter of the neuron
branch_level:the level of node,-1-end point
path_length:the distance from point to the center node
degree:the angle of two branches
In swc_data folder:
swc_v0 raw data that crawler from the site
swc_v1 first version of data repaired by software
swc_v3 final version data repaired by software
swc_fake virtual neuron data generated by software
In png_data folder:
png_v0 raw neuronal image data that crawler from the site
png_v1 image data generated from swc_v1
png_fake virtual neuron data generated by software
png_r resampled_rat_img, resampled images of neurons, the treated neurons look simpler
png_fc fixed_coordinate_rat_img, XY align images of neurons
swc is similar to png's directory format
for example,there are two folders,train, test under swc_v0
There are 5 folders under train, such as principal cell, each containing prinary_cell_class same class of neurons
There are several folders under the principal cell, such as pyrimidal, each containing the same class of neurons secondary_cell_class
There are several folders under pyrimidal, such as Not reported, each containing the same class of neurons teritary_cell_class
There are several swc files under Not reported
Put all swc or png files in the same folder, run the scratch.py files, and divide the dataset into train and test sets and the large and sub-classes by modifying the last two path parameters of divide_dataset ('./png_data/v0', './png_data/png_v0', train_id),the first path is the folder before the division, and the second path is the folder after the division.
Ubuntu 16.04.6 LTS
python3.5.2
The required python package can be installed by running pip install -r requirements.txt
There are 4 models and their corresponding input and training codes, including RNN, TRNN, CNN, multi, and are placed in the corresponding directory. Then there's the SVM model.
Each model consists of _input, _model, and _train three code files. (Multi and CNN don't have _model files) and model files that are pre-trained
_input The file contains a Dataset class that is inherited from torch.utils.data.dataset for data reading and pre-processing. _input The problem of sample equilibrium is considered in the list_file function in the file, and the number of samples of all classes is simply expanded to the same size
_model The file includes the code needed to build the model, i.e. a subclass that includes a torch.nn.Module
_train file contains code that trains or tests the data, and the required parameters and data catalogs are placed at the beginning of the file. If you want to use a pretrained model, change the trained parameter to True and change the model_filename parameter to the corresponding model file name
Which model needs to be trained, open the corresponding _train file,comment off the corresponding part according to two or 12 categories and modify the data catalog data_dir and superparameters. Data_dir the format please refer to the original modification. If you use new data, you need to run scratch.py to pre-divide the training set and test set and place neurons by class. When the training is complete, the corresponding model file is automatically generated under the CNN_train folder.
If there are more than one GPU on the server, use os.environ to specify the GPU. The CNN model uses nn.DataParallel to train at the same time using four gpus. But if there are no multiple blocks of gpu on the server, there may be errors.
Use two layers of lstm and add a full-connection layer in the end, more details is in structure.pptx
The Models is imported by the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,
But it's simpler than the model of the original paper. The model consists of two layers of lstmcell, rolled back in the order from leaf node to root node, and full connection layer is in the end. more details is in structure.pptx
Use pre-trained resnet18 and replace the last full-connection layer to accommodate the number of categories. Training with a smaller learning rate to train the front layers, with a larger learning rate to train the last layer of full connection layer.
Multi imports pre-trained trnn_model (excluding the last layer of full connection) and cnn_model (excluding the last layer of full connection), i.e. extracting the two-part feature vector and stitch the characteristic vectors of the two parts together, add a layer of full connection. Only this last layer of full connection is trained during training.
From sql, 19 morphological-related features were selected,and saved in the features file (pickle form, see the structure.pptx), using the svm classification (sklearn. SVC)
The 19 dimensions include:
Total_Length, Number_of_Bifurcations, Fractal_Dimension, Number_of_Stems, Number_of_Branches, Total_Surface,
Max_Branch_Order, Average_Rall_s_Ratio, Max_Euclidean_Distance, Partition_Asymmetry, Overall_Depth, Soma_Surface,
Overall_Height, Average_Diameter, Overall_Width, Total_Volume, Average_Bifurcation_Angle_Remote,
Average_Bifurcation_Angle_Local, Max_Path_Distance