wenxinxu / resnet-in-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

826.0 826.0 276.0 13.02 MB

Re-implement Kaiming He's deep residual networks in tensorflow. Can be trained with cifar10.

License: MIT License

Python 100.00%

resnet-in-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

apollo-time zhly0 zxzstar zhiyu-chen soledad89 jdc08161063 allensmile benjamesbabala wucanyi robustfengbin xiaohu2015 nccheng tengyuma kanghsi liubinggunzu linan7788626 hylhero chengwang88 tinder-ztaylor yuckfu 123chengbo 2php ml-lab walterwang zhyj3038 lukehe mercileesb tetegra yuanmanli yungbyun ayulove scholltan jhwann hwang595 kmario23 xiaoxin700 lixiangideal defypp davidsonggithub mzthhy ronniedaniels lxhsjtu dongzhuoyao thunguyenphuoc ericchg jackforward garfield2005 xmumath usernamezhx josephxq tutty427 xhqglorry11 xzllxls rajibchakravorty naveen18 yesyu relh minhyungcho cteant jiangweisuc wenhuazang zironycho fence woniuhu zyhe dhruvramani lokhande-vishnu jplnasa5 carabob pkwangwanjun vdaneshpajooh doriswzg xuelun huanghuidong forin-xyz lwwang enggen jegzheng jianghuairong gangyuanfan zhukkang joseph-chan nnnyyyy2017 kairobo acerge jingang-cv dayhost playezio hengshan123 nataliezou skylook younglbt jeremyzhang866 barbecacov highwayns wanderincode luciferofwg jameskry samxiaosheng bodhisattwa-chakraborty

resnet-in-tensorflow's Issues

how to test

how to test?please list detailed codes.thank you,
the code below your page is not detailed, I don't know how to test this resnet .please you could list detailed codes .thank you very much. @wenxinxu

reuse should be 'True' in test phase. Shouldn't it?

resnet-in-tensorflow/cifar10_train.py

Line 225 in c1ef9f4

    
           logits = inference(self.test_image_placeholder, FLAGS.num_residual_blocks, reuse=False)

DuplicateFlagError: The flag 'version' is defined twice.

DuplicateFlagError: The flag 'version' is defined twice. First at C:\Users\khazi\AppData\Local\Continuum\anaconda3\Lib\site-packages\ipkernel_launcher.py and second at C:\Users\khazi\AppData\Local\Continuum\anaconda3\Lib\site-packages\ipkernel_launcher.py

Python Version = 3.7.3
Tensorflow Version = 1.14.0

inference Error

When I call inference, I got an error:
Traceback (most recent call last):
File "/home/forever/PycharmProjects/PIG/resnet.py", line 313, in
conv9_out = inference(x, FLAGS.num_residual_blocks, reuse=False)
File "/home/forever/PycharmProjects/PIG/resnet.py", line 194, in inference
assert conv3.get_shape().as_list()[1:] == [8, 8, 64]
AssertionError

can you tell where should I change my code?

function create_variables in resnet.py

def create_variables(name, shape, initializer=tf.contrib.layers.xavier_initializer(), is_fc_layer=False):
if is_fc_layer is True:
regularizer = tf.contrib.layers.l2_regularizer(scale=FLAGS.weight_decay)
else:
regularizer = tf.contrib.layers.l2_regularizer(scale=FLAGS.weight_decay)

new_variables = tf.get_variable(name, shape=shape, initializer=initializer,
                                regularizer=regularizer)
return new_variables

These two lines, the same?

Please give a exact example for test

example below is not enough. please give a exact example for test code

Test

The test() function in the class Train() help you predict. It returns the softmax probability with shape [num_test_images, num_labels]. You need to prepare and pre-process your test data and pass it to the function. You may either use your own checkpoints or the pre-trained ResNet-110 checkpoint I uploaded. You may wrote the following lines at the end of cifar10_train.py file
train = Train()
test_image_array = ... # Better to be whitened in advance. Shape = [-1, img_height, img_width, img_depth]
top1_error, loss = train.test(test_image_array)

Run the following commands in the command line:

If you want to use my checkpoint.

python cifar10_train.py --test_ckpt_path='model_110.ckpt-79999'

the predict is not with the same result of evaluate

when I predict the model, the result is not the same with evaluate, and the result has much difference. I fetch the fc_weight , save to model and restore from model of the weight is not the same. Maybe the model has something wrong.

The result of the resnet-32

Hi,
Firstly I cannot get the best accuacy (6.7%) as reported. Set the 'is_full_validation' as 'True' and keep other settings the same as the souce code, I run the 'cifar10_train.py'. I only have my best results as 7.22% at about 77809 iters and the second best results as 7.28% at about 69207 iters. Maybe there are some tips that I have ignored. Would you give me some suggestions about it?
Secondly I notice that the validation curve is more unstable compared to the results in original paper. I run the code and find that it doesn't seem to converge. The results on validation set are shocked at last. Is there something wrong?

Whole validation accuracy using provided model_110.ckpt-79999 is extremely low

I try to test accuracy on the whole validation set using the provided ckpt, so I modify cifar10_train.py like below

# Initialize the Train object
train = Train()
# Start the training session
# train.train()

validation_array, validation_labels = read_in_all_images([vali_dir],
                                                         is_random_label=VALI_RANDOM_LABEL)
predictions=train.test(validation_array)
vali_accu=np.mean((np.argmax(predictions,1)==validation_labels.astype(int)).astype(float))
print 'total accu on vali is %f'%vali_accu

But the result is extremly low

total accu on vali is 0.334300

Then I trained my owner checkpoint from scratch, and got a much better accuracy

total accu on vali is 0.9161000

Is there anything wrong during my test process?

about weight initialization

why are you using tf.contrib.layers.xavier initializer instead of tf.contrib.layers.variance_scaling_initializer() ??

load checkpoint

Can I load checkpoint model_110.ckpt-79999 into ResNet-32?

感谢您的分享
有个问题我不太明白，想请教
我使用您的的代码，设定
num_residual_blocks =5 （32层）
load checkpoint model_110.ckpt-79999
之后继续在在cifar 10的数据集上训练了2000 次
top-1 错误率为什么会在15%左右？
难道不应该是 7%左右码?

Batch_Noramlization

when I run this ,I encounter a problem,it shows 'InvalidArgumentError:Input to reshape is a tensor
with 8 values ,but the requested shape has 64' ,the problem locates here ' mean, variance = tf.nn.moments(input_layer, axes=[0, 1, 2])' when I change to ' mean, variance = tf.nn.moments(input_layer, axes=[0])' ,it is ok ,but when the 'axes=[0,1,2]' it is wrong ,I dont know why ,can you help me ?

Want to validate once before training.

In the cifar10_main.py, the train() function.
The author comment: Want to validate once before training. You may check the theoretical validation.
What does this mean? Thanks a lot

please help error with checkpoint

I have an error NotFoundError (see above for traceback): Tensor name "truediv_1/ExponentialMovingAverage_1" not found in checkpoint files model_110.ckpt-79999
I've change number of residual blocks to 18 to get 110 layers. It doesn't help

UnrecognizedFlagError: Unknown command line flag 'f'

UnrecognizedFlagError Traceback (most recent call last)
in
----> 1 train_dir = 'logs_' + FLAGS.version + '/'

~\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\platform\flags.py in getattr(self, name)
82 # a flag.
83 if not wrapped.is_parsed():
---> 84 wrapped(_sys.argv)
85 return wrapped.getattr(name)
86

~\AppData\Local\Continuum\anaconda3\lib\site-packages\absl\flags_flagvalues.py in call(self, argv, known_only)
631 suggestions = _helpers.get_flag_suggestions(name, list(self))
632 raise _exceptions.UnrecognizedFlagError(
--> 633 name, value, suggestions=suggestions)
634
635 self.mark_as_parsed()

the error in file 'resnet.py' (how to run with tensorflow 1.0.0)

hi, thanks for u code :)
when i run 'cifar10_train.py' found a error:
line 40,"new_variables = tf.get_variable(name, shape=shape, initializer=initializer,
regularizer=regularizer)"

TypeError: init() got multiple values for keyword argument 'dtype'

About batch normalization

The batch_normalization_layer() function doesn't compute the statistics of population i.e. population mean and variance. The part implemented is only taking care of the training procedure (batch statistics), but while testing one will need the population statistics

What about results on cifar100

Does anyone test it on cifar100, what is the performance?

Why does the validation error fluctuate so much?

If I would like to get the ’Training curve‘

I can change the ‘num_residual_blocks’ if I would like to get a phtot like your Training curve.
If I set ‘num_residual_blocks’=3 this is a 20-resnet ?
If I set ‘num_residual_blocks’=5 this is a 32-resnet ?
If I set ‘num_residual_blocks’=9 this is a 56-resnet ?
If I set ‘num_residual_blocks’=18 this is a 110-resnet ?

It is OK?

want to finetune the fc layer

I only want to fine-tune the layer of fc, how can i do, thank you

Working on python 2.7

this code is written on python 2.7, libraries like cPickle is not working on python 3.7

Error "no such file or directory" while training the model using the uploaded checkpoint

Hi. I am using this project as a practice of understanding CNN deeper. Since this model takes 80000 steps to finish training, I was trying to use the uploaded checkpoint of step 79999 to accelerate the training process. However, when I typed the following command
python cifar10_train.py --is_use_ckpt=True --test-ckpt_path='model_110.ckpt-79999'
an error saying "no such file or directory" showed up. What might be the potential problem? Thank you very much.

value error due to too many values to unpack

I got this error when i run python_train.py
Can anyone please tell me how to resolve these errors?

Traceback (most recent call last):

File "cifar10_train.py", line 426, in Model restored from model_110.ckpt-79999
0 batches finished!
10 batches finished!
20 batches finished!
30 batches finished!
40 batches finished!
50 batches finished!
60 batches finished!
70 batches finished!

top1_error, loss = train.test(test_image_array)

ValueError: too many values to unpack

The validation loss is so big?

Hello sir:
I run the demo in my database, but i meet so many questions. The top1 error is 0 during trainning, but the validation top1 error is about 0.7. The number of my train dataset is 1300 and validation dataset is 400.Thanks!

all the input arrays must have same number of dimensions

working on python3
and have changed cPickle to pickle && data = dicts['data'] to data = dicts.get('data')

I am encountering a problem about
ValueError: all the input arrays must have same number of dimensions
in cifar10_input.py

Traceback (most recent call last):
File "/data/tmp/pycharm_project_979/cifar10_train.py", line 425, in
train.train()
File "/data/tmp/pycharm_project_979/cifar10_train.py", line 86, in train
all_data, all_labels = prepare_train_data(padding_size=FLAGS.padding_size)
File "/data/tmp/pycharm_project_979/cifar10_input.py", line 176, in prepare_train_data
data, label = read_in_all_images(path_list, is_random_label=TRAIN_RANDOM_LABEL)
File "/data/tmp/pycharm_project_979/cifar10_input.py", line 96, in read_in_all_images
data = np.concatenate((data, batch_data))

and when I print (data.shape) it shows (0, 3072), print(batch_data) it shows None

How can I fix the problem?

train the model with gpu

Hello,I want to know how to train the model with gpu?Now when I excute "python cifar10_train.py" it only uses cpu,tell me how to train the model with gpu.Thank you!

Why conv1 in first block is not followed by batch norm and relu

resnet-in-tensorflow/resnet.py

Line 136 in 8ba8d89

    
           conv1 = tf.nn.conv2d(input_layer, filter=filter, strides=[1, 1, 1, 1], padding='SAME')

Can you please highlight the necessary section in the paper or in the original implementation.

Thank you

set step of lr decay

why do you choose 40000 as the first step to change lr? it seems that smaller step of changing lr works better.

I don't understand why we will use random lable for training. Someone help me, thank you.

TRAIN_RANDOM_LABEL = False # Want to use random label for train data? VALI_RANDOM_LABEL = False # Want to use random label for validation?

Fail to read the file and run training

I have followed the REAME and get this error while running training.

there are no way to test only?

i successed to train the model in my pc.
but i can't get accuracy of test.

there are no way to test only?
when i checked test(self, test_image_array) method
there are no call.

Start working on resnet

Hi,
I am new in Resnet. So, I would like to ask how can I put my data in the code. Furthermore, I need to solve a regression task, so, could you give some information about how I can modify the code in order to do this task.
Thank you.

Resnet_train Error

I have been running inference with small number of images and then training; code only runs for one step and then breaks with following error:

step 0, loss = 1.13 (14.0 examples/sec; 0.642 sec/batch)
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py', wdir='C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master')

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py", line 76, in
image_tensor = sess.run(error)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 984, in _run
self._graph, fetches, feed_dict_string, feed_handles=feed_handles)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 410, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)

File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 227, in for_fetch
(fetch, type(fetch)))

TypeError: Fetch argument None has invalid type <class 'NoneType'>

INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Run call was cancelled

About pre-train model

Hi @wenxinxu ,

Was the model model_110.ckpt-79999 fine tune from others like caffe model or totally retrain from cifar10 dataset?

Thanks

Custom dataset

Hi, I want to apply your resnet on my dataset. I have created the dataset similar Cifar10, Binay format. Can anyone help me to use my dataset instead of cifar10 train data?

_read_one_batch

` fo = open(path, 'rb')
dicts = pickle.load(fo)
fo.close()

data = dicts['data']
if is_random_label is False:
label = np.array(dicts[b'labels'])
else:
labels = np.random.randint(low=0, high=10, size=10000)
label = np.array(labels)
return data, label`

当运行到 dicts = pickle.load(fo)时，报错：UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)。您没遇到过这种情况吗？

2.当修改成dicts = pickle.load(fo,encoding='bytes')程序可以继续运行，但是在data = dicts['data']报错：KeyError: 'data'。当我查看dicts.key()后，我发现结果是：dict_keys([b'data', b'labels', b'batch_label', b'filenames'])，为什么每个键的前面会出现字母b？

why i just get validation top1 error,but without train top1error and validation loss?thank you very much!

like this,when i run the code in terminal ,.
it just get
Train top1 error =
Validation top1 error = 0.4200
Validation loss =

so why that train top1 error and validation loss is no output

thank you very much!