crbs / cdeep3m Goto Github PK

View Code? Open in Web Editor NEW

58.0 58.0 10.0 40.32 MB

Please go to https://github.com/CRBS/cdeep3m2 for most recent version

License: Other

MATLAB 38.00% Shell 44.62% Ruby 0.38% Makefile 1.50% M 0.45% Python 15.06%

cdeep3m's People

Contributors

Stargazers

Watchers

Forkers

tkatchalski redistributer jurgenkriel matthewbm keepersecond crispycrafter bnaegel cakuba hustergjx

cdeep3m's Issues

problems executing the example scripts

Hi,
I installed the software on our ubuntu 16.04 workstation with CUDA 8.0 and try to run the example scripts, however with both receive errors... Any suggestion?? Thanks in advance!
Kevin

k.knoops@nano:~$ runprediction.sh ~/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out ~/cdeep3m-1.4.0/mito_testsample/testset/ ~/predictout30k
Starting Image Augmentation
Check image size of:
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/mito_testsample/testset/
Reading file: /home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/mito_testsample/testset/images.081.png
z_blocks =

1 5

Start up worker to generate packages to process
Start up worker to run prediction on packages
Start up worker to run post processing on packages

To see progress run the following command in another window:

tail -f /home/local/UNIMAAS/k.knoops/predictout30k/logs/*.log
error: 'fileformats' undefined near line 13 column 30
error: called from
filter_files at line 13 column 23
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/EnsemblePredictions.m at line 35 column 12
error: evaluating argument list element number 1
error: called from
filter_files at line 13 column 23
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/EnsemblePredictions.m at line 35 column 12
ERROR, a non-zero exit code (1) was received from: EnsemblePredictions.m /home/local/UNIMAAS/k.knoops/predictout30k/1fm /home/local/UNIMAAS/k.knoops/predictout30k/3fm /home/local/UNIMAAS/k.knoops/predictout30k/5fm /home/local/UNIMAAS/k.knoops/predictout30k/ensembled
k.knoops@nano:~$

k.knoops@nano:~/cdeep3m-1.4.0$ ./runtraining.sh /home/local/UNIMAAS/k.knoops/mito_testaugtrain ~/output Verifying input training data is valid ... success
Copying over model files and creating run scripts ... success

A new directory has been created: /home/local/UNIMAAS/k.knoops/output
In this directory are 3 directories 1fm,3fm,5fm which
correspond to 3 caffe models that need to be trained

Detected 2 GPU(s). Will run in parallel.
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
Non zero exit code from caffe for train of model. Exiting.
ERROR, a non-zero exit code (1) was received from: trainworker.sh --numiterations 30000
k.knoops@nano:~/cdeep3m-1.4.0$

Problems with EnsemblePredictions.m

I already ran dos2unix on EnsemblePredictions.m cause octave was giving errors when trying to run the script from the command line. I also made the script executable so please refresh your source tree (on master branch)

When I run EnsemblePredictions.m with no arguments I get this error message:

./EnsemblePredictions.m 
error: Invalid call to exist.  Correct usage is:

 -- Built-in Function: C = exist (NAME)
 -- Built-in Function: C = exist (NAME, TYPE)
error: called from
    print_usage at line 90 column 5
    ./EnsemblePredictions.m at line 21 column 1

octave does not like using exist on to_process variable. I wasn't sure what is trying to be done here so I made a ticket.

Questions about these .m file

Is this used or can it be deleted?

scripts/de_augment_data copy.m
scripts/Images2H5.m
scripts/post_processing/combinePredicctionSlice copy.m

Add call to postprocess prediction data in Predict.m

Right after prediction is run for each model (1fm,3fm,5fm) Run the post image processing script (StartPostprocessing.m) to reduce the data footprint. This process can should be invoked and NOT waited on unless its the last job to run.

Stitching

Improve stitching of large images

Switch runprediction.sh pipeline to augment data on the fly

Instead of running PreprocessImageData.m in runprediction.sh instead have the code call:
def_datapackages.m

this also writes a textfile in the same folder (where the de_augmentation matlab file is stored),
which you could use to tell the wrapper how many z-stacks and how many x/y packages are to be done.

Then in the loop in preprocess_package.m must called for every z stack and every datapackage to create the augmented images.

preprocess_package.m  <indir> <outdir> <xy_package> <z_stack> <augmentation speed>
% Example: preprocess_package ~/EMdata1/ ~/AugmentedEMData/ 15 2 1fm 10
%
% Speed: supported values 1,2,4 or 10
% speeds up processing potentially with a negative effect on accuracy (speed of 1 equals highest accuracy)

Update CreateTrainJob.m with guess of what iteration of trained model to use

Add this either end of CreateTrainJob.m, but this code should look at out.log for each model and get the loss value for all iterations and find where the slope of this curve levels out or goes below some threshold. Output this value into a README.txt file that is put in the output of the train directory.

grep "loss = " LOG/* | sed "s/^.*]//"

Copy over VERSION file to train and predict jobs

copy over the VERSION file in source tree to predict and train job directories so there is a record of the version of software used.

CreateTrainJob.m should accept stack of images and labels as input

CreateTrainJob.m should directly call PreprocessTrainingData.m so the user does not need to make a separate call.

Re organization of script files

Would you be opposed to a new structure for the source tree?

I'm also going to shy away from dropping scripts into the train job and predict job folders and just have them in the path, cause its easier to test.

I was thinking something like this (i already started messing with this new style in chrisdev branch):

.
├── aws/
├── Makefile
├── model/
│   ├── inception_residual_train_prediction_1fm
│   ├── inception_residual_train_prediction_3fm
│   └── inception_residual_train_prediction_5fm
├── README.md
├── scripts/
│   ├── caffepredict.sh
│   ├── caffetrain.sh
│   ├── CreatePredictJob.m
│   ├── CreateTrainJob.m
│   ├── EnsemblePredictions.m
│   ├── functions/  (all non directly executable matlab files would be put in here)
│   ├── Merge_LargeData.m
│   ├── PreprocessImageData.m
│   ├── PreprocessTrainingData.m
│   ├── run_all_predict.sh
│   ├── run_all_train.sh
│   ├── RunUnitTests.m
│   └── StartPostprocessing.m
├── vagrant/
└── VERSION

Released versions would look like this:

.
├── model/
│   ├── inception_residual_train_prediction_1fm
│   ├── inception_residual_train_prediction_3fm
│   └── inception_residual_train_prediction_5fm
├── README.txt
├── scripts/
│   ├── caffepredict.sh
│   ├── caffetrain.sh
│   ├── CreatePredictJob.m
│   ├── CreateTrainJob.m
│   ├── EnsemblePredictions.m
│   ├── functions/  (all non directly executable matlab files would be put in here)
│   ├── Merge_LargeData.m
│   ├── PreprocessImageData.m
│   ├── PreprocessTrainingData.m
│   ├── run_all_predict.sh
│   ├── run_all_train.sh
│   ├── RunUnitTests.m
│   └── StartPostprocessing.m
└── VERSION

Create script to parse out loss rate and other values from log files of prediction

Write a script to extract the loss rate and other key learning values from log files of prediction. Ideally should be able to create a plot of this data.

Problem with run_all_predict.sh

run_all_predict.sh didn't actually do the prediction
finished in 1sec reporting it was done

readme

In the Run CDeep3M training and prediction instructions

under the Run Segmentation subtitle
runprediction.sh ~/my_images ~/predictout
should be
runprediction.sh ~/my_trained_model ~/my_images ~/predictout

Using the pretrained model at /sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out failed

Hi, I got a problem here when trying to reproduce your results locally in a Linux docker. I followed all the steps as in the wiki just fine until this page: https://github.com/CRBS/cdeep3m/wiki/Tutorial-3-Run-CDeep3M.

I got stuck in step 5 with the command as

runtraining.sh --additerations 20 --retrain ~/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out ~/augtrain ~/model

The running of the above command failed as I realized that when using the pretrained model at sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out/1fm/trainedmodel/1fm_classifer_iter_30000.solverstate, this solverstate file will call the model file "1fm_classifer_iter_30000.caffemodel" by default at the location /home/ubuntu/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out/1fm/trainedmodel/1fm_classifer_iter_30000.caffemodel, which was written into the solverstate binary file!

Since my installation of CDeep3M is not under the directory of /home/ubuntu, I'm wondering if it is possible to update this default path for caffemodel file in the solverstate file when retraining the model from 30000iterations_train_out. Furthermore, can somebody tell me that if I'm going to train my own model, the snapshot of the model output in solverstate file would include the absolute full path instead of relative path? It is important if someone else would like to use the pretrained model... otherwise, they have to reproduce exactly the same configuration environment as I have locally.

Thanks for your help.

Enable parallelism in prediction

Update caffepredict.sh script to examine # of GPUs and run predictions in parallel to utilize all GPUs found. This should also be done in run_all_predict.sh script.

Adding test data for users

We should add a set of test and training images somewhere (not sure, but we could host them on ccdb?)
for the users to go through all steps

update prediction to run with less then 16 augmented variations of the data

Adjust caffepredict.sh script to run with less then 16 augmented variations of the data. The code should instead just see what .h5 files are there and run prediction on those files keeping the same naming convention.

Modify Train.m to put train_file.txt into base of output directory

Train.m should create a train_file.txt and put it in the base of the output directory. solver.prototxt should then have that file as its input. This way if a user wants to change input training data they just modify the train_file.txt in the output directory.

Remove augmented h5 files once they are predicted

Delete augmented h5 files after StartPostprocessing ran or probably remove
the entire Package folder (e.g. ~/predictout30k/augimages/1fm/Pkg001_Z01/)

--1fmonly flag causing error when used passed to runtraining.sh

Found by damsport11 (see: #46 (comment))

Passing --1fmonly to runtraining.sh kicks out this error message:

$0: unrecognized option '--1fmonly'

The failure is because trainworker.sh was switched to the newer --models flag. Just need to adjust runtraining.sh to pass --models 1fm to trainworker.sh

merge lucas branch into master

-Once post-processing ran, merge Lucas branch back into master

update default number of iterations in runtraining.sh to 30000

Increase default number of iterations from 2000 to 30000 in runtraining.sh

Bug in StartPostprocessing.m

Worked previously,
Possibly new error introduced while augmenting image data:
<<<
Generating Average Prediction of /home/ubuntu/ImageData/Results_Mito/1fm/
all_files =

104x1 struct array containing the fields:

name
date
bytes
isdir
datenum
statinfo

Merging 16 variations of file test.h5_shift_ ... number 1 of 101
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
error: recover8Variation: A(I,J,...) = X: dimensions mismatch
error: called from
merge_16_probs_v2>recover8Variation at line 104 column 32
merge_16_probs_v2 at line 27 column 15
./StartPostprocessing.m at line 43 column 21

/usr/bin/time not available on all systems

Not all systems have time installed or in /usr/bin for that matter so it might be a good idea to remove the /usr/bin prefix and have the user update their path or add a parameter to the .sh scripts which lets the user set the time command used with the default set to just time

Modify runprediction.sh to NOT delete png files under model directories

In case a given model does better then the others it is better to not delete the png files under the model directories (1fm, 3fm, 5fm) in runprediction.sh Currently this is done towards end of runprediction.sh script by these lines:

for Y in `echo $space_sep_models` ; do
    /bin/rm -f $out_dir/$Y/*.png
done

Just remove these lines for now.

Enable parallel training

Use gnu parallel to run train in parallel if possible

Update documentation

-Update documentation; Deep3M Wiki

How to invoke Merge_LargeData.m

I tried running Merge_LargeData.m on this directory:

.
├── 1fm
│   ├── de_augmentation_info.mat
│   └── Pkg001_Z01
│       ├── DONE
│       ├── log
│       ├── out.log
│       ├── test.h5_shift__0000.png
│       ├── test.h5_shift__0001.png
│       ├── test.h5_shift__0002.png
│       ├── test.h5_shift__0003.png
│       └── test.h5_shift__0004.png
├── 3fm
│   ├── de_augmentation_info.mat
│   └── Pkg001_Z01
│       ├── DONE
│       ├── log
│       ├── out.log
│       ├── test.h5_shift__0000.png
│       ├── test.h5_shift__0001.png
│       ├── test.h5_shift__0002.png
│       ├── test.h5_shift__0003.png
│       └── test.h5_shift__0004.png
├── 5fm
│   ├── de_augmentation_info.mat
│   └── Pkg001_Z01
│       ├── DONE
│       ├── log
│       ├── out.log
│       ├── test.h5_shift__0000.png
│       ├── test.h5_shift__0001.png
│       ├── test.h5_shift__0002.png
│       ├── test.h5_shift__0003.png
│       └── test.h5_shift__0004.png
├── caffe_predict.sh
├── de_augmentation_info.mat
├── out
└── run_all_predict.sh

With this command for 1fm:

Merge_LargeData.m 1fm/de_augmentation_info.mat ./out/

but got this error.

octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
Starting to merge large image dataset
Processing:
1fm/de_augmentation_info.mat
Combining image stacks
error: 'fileformats' undefined near line 13 column 30
error: called from
    filter_files at line 13 column 23
    /home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9
error: evaluating argument list element number 1
error: called from
    filter_files at line 13 column 23
    /home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9
error: evaluating argument list element number 1
error: called from
    /home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9

I'm guessing I am doing something wrong.

Modify cloudformation template to install IMOD

Add commands to INSTALL IMOD into cloud formation template file.

Add progress to run_all_predict.sh

There should be progress output for each Pkg_Z## and each .h5 being processed within. Also an output of time would be nice something like:

Running 1fm 53 Pkg_Z## folders to process
  Running Pkg_Z### X of 53.....(period for each .h5 file). <time in seconds>

Add invocation to EnsemblePredictions.m for end of prediction

Call EnsemblePredictions.m after all predictions.

Add --gpu option to runprediction.sh

Add --gpu option to runprediction.sh so caller can specify a specific GPU to use instead of all available GPUs

Standardize name for trained weights (make 1fm, 3fm and 5fm same)

Currently 1fm trained weights are named different than 3fm and 5fm

Add --gpu option to runtraining.sh

Add --gpu # option to runtraining.sh so users can specify to use a certain GPU instead of all GPUs.

typo in PreprocessImageData.m

There is a typo in fprintf on line 105. I can fix it, but wasn't sure if you were working on this file so I made a ticket.

Output from run:

Created 1 packages in x/y with 1 z-stacks
error: fprintf: invalid format specified
error: called from
    ./PreprocessImageData.m at line 105 column 1

fprintf('Data stored in:\n %\n', outdir);

Remove quotes around tail command printed to user in runprediction.sh

Remove the quotes around the path passed to tail command in output from runprediction.sh

To see progress run the following command in another window:

tail -f "/home/ubuntu/predictout30k/logs/*.log"

Update runtraining.sh with custom flags to allow caller to adjust learning rate etc.

Add flags to runtraining.sh to allow user to optionally adjust these values in solver.prototxt:

base_lr: 1e-02
power: 0.8
momentum: 0.9
weight_decay: 0.0005
average_loss: 16
lr_policy: "poly"
iter_size: 8
snapshot: 2000

imageimporter & imageimporter_large

I need to included specific errormessage if imageimporter or imageimporter_large come back empty

EnsemblePredictions.m fails when called by runprediction.sh if there is only 1 model

if only 1 model is specified (ie --models 1fm) then EnsemblePredictions.m will fail when it is run cause it requires at least 2 models to merge the data. To fix this simply check the model count and if only 1 just make a symlink named ensembled and point it to the model directory where the images reside.

Run 2D versions

Could you duplicate versions of Run_all called Run_2D
which will only launch 1fm versions of train and predict

Create or rename Create scripts

Rename or write new scripts to replace CreateTrainJob and CreatePredictJob. They should be something like runtrain and runpredict and take non augmented images as input.

Create CreatePredictJob.m

This tool should take output of PreProcessImageData.m and trained models directory from CreateTrainJob.m to run prediction on all the Pkg directories running predictions. The output should be another directory with a mirrored structure that can be consumed by the postprocessing script.

Usage:

CreatePredictJob.m <Output of Train.m after training run> <augmented image data> <output directory>

Desired output structure:

<output>/
                 1fm/
                       <copy over de_augment.m file, or hardlink it>
                        Pkg001/
                        Pkg002/
                 3fm/
                        <copy over de_augment.m file, or hardlink it>
                        Pkg001/
                        Pkg002/
                 5fm/
                        <copy over de_augment.m file, or hardlink it>
                        Pkg001/
                        Pkg002/

uint8 vs single dataformat

Realized that training data was converted and saved as single, whereas other data was saved as uint8
Despite the fact that single is 4 times as large i'm not sure what other effects this could have, currently testing, but think this has been in the code already since a while

add to end of PreprocessTraining

add to end of PreprocessTraining:
tee ~/deep3m/model/inception_residual_train_prediction_1fm/train_file.txt ~/deep3m/model/inception_residual_train_prediction_3fm/train_file.txt ~/deep3m/model/inception_residual_train_prediction_5fm/train_file.txt < ./train_file.txt >/dev/null

$ /home/ubuntu/training_data/predict/run_all_predict.sh /home/ubuntu/100_it_trained_model/ /home/ubuntu/training_data/5stackaug/
Running 1fm predict (1) packages to process
  Processing Pkg001_Z01 1 of 1 
Non zero exit code from caffe for predict /home/ubuntu/training_data/predict/1fm/Pkg001_Z01 model. Exiting.
Here is last 10 lines of /home/ubuntu/training_data/predict/1fm/Pkg001_Z01/out.log:

Starting to merge large image dataset
Processing:
/home/ubuntu/training_data/predict/1fm/de_augmentation_info.mat
error: load: unable to find file /home/ubuntu/training_data/predict/1fm/de_augmentation_info.mat
error: called from
    /home/ubuntu/deep3m/Merge_LargeData.m at line 41 column 1
Command exited with non-zero status 1
real 227.88
user 120.86
sys 78.49

h5write calls have been commented out in PreprocessTrainingData.m

On master branch the h5write calls have been commented out in PreprocessTrainingData.m