Comments (5)
@sflender like most network training paradigms, diluvian terminates training early if the validation loss (or actually in this case, validation_metric
which is by default F_0.5 score of masks of single-body validation volumes) has not improved in several epochs. The number of epochs is controlled by the CONFIG.training.patience
parameter (see docs). To effectively disable this, just set patience
to be larger than total_epochs
.
from diluvian.
Hi Ravil,
My guess as to what's happening is this:
the values of loss and validation loss were about the same ( 0.63 )
For a near-default configuration this loss is high and it's unlikely the network is generating high probability, structured output yet. Hence the filling volumes are empty because the network is not yet trained to the point where it's outputting probability high enough to cause FOV moves. You can tell this is happening if during the filling step the lower progress bar is just flashing by quickly and shows 1 / 1
at the end (this is the progress bar for filling each region). Another way to check is that during training if you increase the output verbosity (-l INFO
) the training generators will show the average numbers of moves in the training subvolumes. In normal training this should increase to be near or above 20 once the network is learning to move.
These networks take a very long time to train to have performance competitive with other approaches -- days on 8 GPUs. However, on 2 GPUs you should be able to train the network to a point where the output starts to look like neuronal segmentation in less than 8 hours.
Some things to try for fast training to good results:
- switching from SGD to Adam
- orders of magnitude larger training and validation sizes than the defaults
from diluvian.
Hi Ravil, Andrew, do you understand why training was terminated automatically after the 76th epoch, even though you did not configure it to stop early? I observed similar behavior.
Best,
-Samuel.
from diluvian.
@aschampion I do the things that you suggest,including changing the optimizer and enlarging training and validation sizes,but still can't get a good results.The training loss can't fall when drop to about 0.3,and the validation loss is about 0.5. The number of gpu is 1,batch size is 12.I change the learning rate,but get no progress.What should i do next,thanks
from diluvian.
@Jingliu1994 You can continue to sweep the parameter space, but there are several things I would suggest first:
- If you're using the CREMI data, the official data still has many ground truth errors and quality problems (e.g., random labels in blank sections). The MALA v2 submission on the CREMI front page has realigned ground truth volumes with much better ground truth labels. This is what the ground truth I was using when working on FFNs was based on, but for various reasons don't automatically distribute it with diluvian.
- Ignore the validation loss (which is often meaningless because of the FFN training process
-- this is why in Google's paper they validate with the skeleton metrics instead) and pay attention only to the F_beta validation metric. Even if the training loss improvement is miniscule the validation metric may still be improving. An example pulled at random from my logs ("val subvolumes" is the validation metric):
- Apply the network at higher resolution, 8nm or 4nm. (Will greatly increase inference time)
- Use a larger input FOV. (Will greatly increase training and inference time)
You should also be aware that Google released their implementation a few weeks ago. If you just want good results and aren't that concerned with having a simple sandbox to experiment with FFN-like architectures, it's probably a better choice than diluvian for you. Multi-segmentation consensus and FFN-based merging are both crucial to the quality of FFN results reported in Google's original paper; diluvian doesn't implement either of these.
from diluvian.
Related Issues (15)
- Save labeled bodies to HDF5 for inspection (single region sparse)
- fill_subvolume_with_model overwrites CUDA_VISIBLE_DEVICES HOT 1
- Training and prediction of the "CREMI" data set. PART 1 HOT 1
- 'pip install diluvian' is not working HOT 3
- UNet should downsample dimensions to nearest isotropy HOT 1
- Subvolume bounds generator should apply mask constraints based on output, not input, FOV shape
- setup.py doesn't work
- Batches should optionally be drawn from a single subvolume generator
- Coalesce adjacent partitions
- Dense segmentation should fill in parallel
- Support dilated conv layers
- Optional noise/dropout in mask input channel for robustness training
- Multi GPU fn's should optionally take list of devices instead of GPU count
- Reimplement a transposed convolution for UNet upscaling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from diluvian.