yixuanli / densenet-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

573.0 23.0 198.0 254 KB

DenseNet Implementation in Tensorflow

License: GNU General Public License v3.0

Python 100.00%

tensorflow densenet

densenet-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

wellwang wanjinchang benjamesbabala vyraun omar-florez lyk125 fireae mailmahee soledad89 ml-lab johndpope ozgurgundogan rioyokotalab pastromhaug liyong3forever roszcz akashgupta299 bywbilly jinfeng-wu yheno youngkwonjo kristijanbartol zhyj3038 ayulove apapaion oppa3109 wzhen1 zihay 123chengbo jianweilin xc35 afelio2 dyz-zju xhivaw xiaojingyi huachunwang canbuoy yingweiy shiontao zgsxwsdxg xue1liu2 bysowhat liumenglife two222 sweaterr mixcoder ieee820 leiup roger1993 lixiaoyaos tangyuan5833 davidmrdavid jkznst neuralnetworkingtechnologies gxfun zmxheart kyuye qioooo amano-ginji lizhangzhan wushicanasl xxy19404 lizhidan anthony123 larenzhang liuheng2cqupt wisdomdeng kjeanclaude anshulbshah chengmuni66 lihua213 yang-fei kuldeeppurohit ramsleyjunting jiefengpeng queenjuliazxx nataliezou jiamery nrjc alexliyang zhimengzhe xzf125244170 jimwi mostafadehghani rizasif chenjie04 hanahimi abdelpakey jzkay12 www0wwwjs1 shwanashrafi yaowang-bjtu sxxhg barbecacov ligouhi jihanfly wsxzwps morindaz yunwenhuang myfortune110

densenet-tensorflow's Issues

Question on validation loss

I have implemented your code on cifar100
when halving the learning rate, the test loss suddenly decrease , then increase
while the validation error keeps unchanged.
Did tensorpack maintain a shadow value and show the EMA of cost? (reset this value after learning rate change)

Any suggesstion would be appreciated
Thanks in advance

Need to consolidate TF function calls with newer API.

error while running the code

TypeError: Can't instantiate abstract class StatPrinter with abstract methods _trigger

How to train in the customer dataset?

Great implementation. I want to use it to train in my dataset that similar the cifar10. Could you tell me how can I modify it? I found that your code load from the dataset that has build-in function implemented by tensorflow. Thanks

densenet not training when using tf.contrib.layers.recompute_grad

I want to implement memory efficient densenet, following the code in
https://github.com/joeyearsley/efficient_densenet_tensorflow/blob/master/models/densenet_creator.py, the traing process is stuck at first epoch
I have just changed the add_layer part

        def add_layer(l):

            def _add_layer(l):
                shape = l.get_shape().as_list()
                in_channel = shape[3]
                with tf.variable_scope(name) as scope:
                    c = BatchNorm('bn1', l)
                    c = tf.nn.relu(c)
                    c = conv('conv1', c, self.growthRate, 1)
                    l = tf.concat([c, l], 3)
                return l
            
            if self.efficient:
                _add_layer = tf.contrib.layers.recompute_grad(_add_layer)
            
            return _add_layer(l)

also add the key word argument "efficient" to specify whether use the memory efficient version.
However the training process stucked.
Using tensorflow 1.9
tensorpack 0.9.1
Do I need to change other parts in the tensorpack?
Thanks in advance

JFYI, win7 64bit Anaconda python 3.5.3 tensorpack 0.3.0 running error

Dear.

Windows 7 64bit
Anaconda
python 3.5.3
tensorpack 0.3.0

Can I know to fix the error ?

running error :

c:\densenet-tensorflow>python cifar10-densenet.py

�[32m[0810 18:05:20 @logger.py:107]�[0m Use a new log directory train_log/cifar1
0-single-fisrt150-second225-max3000810-180520
�[32m[0810 18:05:20 @logger.py:73]�[0m Argv: cifar10-densenet.py
�[32m[0810 18:05:20 @fs.py:89]�[0m �[5m�[31mWRN�[0m Env var $TENSORPACK_DATASET
not set, using C:\Users\java\tensorpack_data for datasets.
�[32m[0810 18:05:20 @cifar.py:33]�[0m Found cifar10 data in C:\Users\java\tensor
pack_data\cifar10_data.

Traceback (most recent call last):
File "cifar10-densenet.py", line 174, in
config = get_config()
File "cifar10-densenet.py", line 143, in get_config
dataset_train = get_data('train')
File "cifar10-densenet.py", line 135, in get_data
ds = PrefetchData(ds, 3, 2)
File "c:\anaconda3\lib\site-packages\tensorpack\dataflow\prefetch.py", line 84
, in init
start_proc_mask_signal(self.procs)
File "c:\anaconda3\lib\site-packages\tensorpack\utils\concurrency.py", line 21
2, in start_proc_mask_signal
p.start()
File "c:\anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "c:\anaconda3\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "c:\anaconda3\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "c:\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 66, in in
it
reduction.dump(process_obj, to_child)
File "c:\anaconda3\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'MapDataComponent.init..f'

Traceback (most recent call last):
File "", line 1, in
File "c:\anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "c:\anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
EOFError: Ran out of input

where is the pretrained model?

[0609 15:27:39 @base.py:252] Epoch 14 (global_step 10934) finished, time:59.7 seconds.

I'm running with dual 1080Ti and get 1 min for each epoch, using same parameters you used. Is this a reasonable running time?

And how should I use my own data for training and testing?

Thank you in advance!!!

Results error

How to test the results?
And how to plot the picture you showed?

Tensorpack update

Hi,

I believe the tensorpack repository has been updated hence your repository needs update too.

I ran the code and the following error appeared:

Traceback (most recent call last):
  File "cifar10-densenet.py", line 173, in <module>
    config = get_config()
  File "cifar10-densenet.py", line 142, in get_config
    dataset_train = get_data('train')
  File "cifar10-densenet.py", line 120, in get_data
    imgaug.CenterPaste((40, 40)),
  File "/home/turanshare/anaconda2/envs/tensorflow/lib/python3.6/site-packages/tensorpack/utils/develop.py", line 151, in __getattr__
    return getattr(module, item)
AttributeError: module 'tensorpack.dataflow.imgaug' has no attribute 'CenterPaste'

Apart from CenterPaste I think the following lines must be updated:

augmentors = [
            imgaug.CenterPaste((40, 40)),
            imgaug.RandomCrop((32, 32)),
            imgaug.Flip(horiz=True),
            #imgaug.Brightness(20),
            #imgaug.Contrast((0.6,1.4)),
            imgaug.MapImage(lambda x: x - pp_mean),
        ]

If you have already solved this please let me know. Otherwise I will be happy to generate a pull request with your help.

Thank you.

Weights initialization

Hi, thanks for the beautiful code.

I'm just curious about the way you initialize the conv weights:
tf.random_normal_initializer(stddev=np.sqrt(2.0/9/channel))

Could you please explain a little bit about this setting? Because I found in the paper they 'adopt the weight initialization introduced by [10]', which is the MRSA initialization. Thanks in advance ;)

How muth time is spent in once forward and backward in a mini-batch 64?

My experiment environment is CUDA 8.0.61, cudnn 6 and a TITAN X (pascal).
In my implementations, time (s/mini-batch) of DenseNet-BC (l=100, k=12) is 0.216s on Cifar10 with batch_size 64, but original that is 0.153s in training.
I wonder this is due to my implementation or tensorflow, so could tell me your time cost?

cifar100-densenet.py missing

Is the cifar100-densenet.py also available for cifar100 results?
Or I overlooked something?

hi, I faced the following error, while I was running the algorithm on two gpus. tensorflow=1.9.0, using docker container.

[0507 11:40:09 @training.py:50] [DataParallel] Training a model of 2 towers.
[0507 11:40:09 @interface.py:31] Automatically applying QueueInput on the DataFlow.
[0507 11:40:09 @interface.py:43] Automatically applying StagingInput on the DataFlow.
Traceback (most recent call last):

launch_train_with_config(config, SyncMultiGPUTrainer(nr_tower))

File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 90, in launch_train_with_config
model.get_input_signature(), input,
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 200, in wrapper
value = func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/model_desc.py", line 86, in get_input_signature
inputs = self.inputs()
File "/usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/model_desc.py", line 116, in inputs
raise NotImplementedError()
NotImplementedError

question about tensor concat

This code l = tf.concat([c, l], 3) in this line seems to be only concat the adjacent two layer output,
shouldn't it be to concat all previous layers in a dense block?

add_transition layer

I found that in you code:
def add_transition(name, l):
shape = l.get_shape().as_list()
in_channel = shape[3]
with tf.variable_scope(name) as scope:
l = BatchNorm('bn1', l)
l = tf.nn.relu(l)
l = Conv2D('conv1', l, in_channel, 1, stride=1, use_bias=False, nl=tf.nn.relu)
l = AvgPooling('pool', l, 2)
return l

After BN and ReLU, there is a 1*1 conv layer. However, you apply nl=tf.nn.relu, do you mean after conv layer, we still need the operation ReLU?
In DenseNet(Caffe version) it is different from your configuration here.
Can you explain it to me ?
Thanks.

Question: per_pixel_mean_subtract on test

In Cifar10-densenet.py
Line 116: ds = dataset.Cifar10(train_or_test)
Line 117: pp_mean = ds.get_per_pixel_mean()
Can the validation set use all test data statistics like per_pixel_mean?
Thanks in advance