wenxinxu / resnet-in-tensorflow Goto Github PK
View Code? Open in Web Editor NEWRe-implement Kaiming He's deep residual networks in tensorflow. Can be trained with cifar10.
License: MIT License
Re-implement Kaiming He's deep residual networks in tensorflow. Can be trained with cifar10.
License: MIT License
how to test?please list detailed codes.thank you,
the code below your page is not detailed, I don't know how to test this resnet .please you could list detailed codes .thank you very much. @wenxinxu
resnet-in-tensorflow/cifar10_train.py
Line 225 in c1ef9f4
DuplicateFlagError: The flag 'version' is defined twice. First at C:\Users\khazi\AppData\Local\Continuum\anaconda3\Lib\site-packages\ipkernel_launcher.py and second at C:\Users\khazi\AppData\Local\Continuum\anaconda3\Lib\site-packages\ipkernel_launcher.py
Python Version = 3.7.3
Tensorflow Version = 1.14.0
When I call inference, I got an error:
Traceback (most recent call last):
File "/home/forever/PycharmProjects/PIG/resnet.py", line 313, in
conv9_out = inference(x, FLAGS.num_residual_blocks, reuse=False)
File "/home/forever/PycharmProjects/PIG/resnet.py", line 194, in inference
assert conv3.get_shape().as_list()[1:] == [8, 8, 64]
AssertionError
can you tell where should I change my code?
def create_variables(name, shape, initializer=tf.contrib.layers.xavier_initializer(), is_fc_layer=False):
if is_fc_layer is True:
regularizer = tf.contrib.layers.l2_regularizer(scale=FLAGS.weight_decay)
else:
regularizer = tf.contrib.layers.l2_regularizer(scale=FLAGS.weight_decay)
new_variables = tf.get_variable(name, shape=shape, initializer=initializer,
regularizer=regularizer)
return new_variables
These two lines, the same?
example below is not enough. please give a exact example for test code
Test
The test() function in the class Train() help you predict. It returns the softmax probability with shape [num_test_images, num_labels]. You need to prepare and pre-process your test data and pass it to the function. You may either use your own checkpoints or the pre-trained ResNet-110 checkpoint I uploaded. You may wrote the following lines at the end of cifar10_train.py file
train = Train()
test_image_array = ... # Better to be whitened in advance. Shape = [-1, img_height, img_width, img_depth]
top1_error, loss = train.test(test_image_array)
Run the following commands in the command line:
python cifar10_train.py --test_ckpt_path='model_110.ckpt-79999'
when I predict the model, the result is not the same with evaluate, and the result has much difference. I fetch the fc_weight , save to model and restore from model of the weight is not the same. Maybe the model has something wrong.
Hi,
Firstly I cannot get the best accuacy (6.7%) as reported. Set the 'is_full_validation' as 'True' and keep other settings the same as the souce code, I run the 'cifar10_train.py'. I only have my best results as 7.22% at about 77809 iters and the second best results as 7.28% at about 69207 iters. Maybe there are some tips that I have ignored. Would you give me some suggestions about it?
Secondly I notice that the validation curve is more unstable compared to the results in original paper. I run the code and find that it doesn't seem to converge. The results on validation set are shocked at last. Is there something wrong?
I try to test accuracy on the whole validation set using the provided ckpt, so I modify cifar10_train.py like below
# Initialize the Train object
train = Train()
# Start the training session
# train.train()
validation_array, validation_labels = read_in_all_images([vali_dir],
is_random_label=VALI_RANDOM_LABEL)
predictions=train.test(validation_array)
vali_accu=np.mean((np.argmax(predictions,1)==validation_labels.astype(int)).astype(float))
print 'total accu on vali is %f'%vali_accu
But the result is extremly low
total accu on vali is 0.334300
Then I trained my owner checkpoint from scratch, and got a much better accuracy
total accu on vali is 0.9161000
Is there anything wrong during my test process?
why are you using tf.contrib.layers.xavier initializer instead of tf.contrib.layers.variance_scaling_initializer() ??
感谢您的分享
有个问题我不太明白,想请教
我使用您的的代码,设定
num_residual_blocks =5 (32层)
load checkpoint model_110.ckpt-79999
之后继续在在cifar 10的数据集上训练了2000 次
top-1 错误率 为什么会在15%左右?
难道不应该是 7%左右码?
when I run this ,I encounter a problem,it shows 'InvalidArgumentError:Input to reshape is a tensor
with 8 values ,but the requested shape has 64' ,the problem locates here ' mean, variance = tf.nn.moments(input_layer, axes=[0, 1, 2])' when I change to ' mean, variance = tf.nn.moments(input_layer, axes=[0])' ,it is ok ,but when the 'axes=[0,1,2]' it is wrong ,I dont know why ,can you help me ?
In the cifar10_main.py, the train() function.
The author comment: Want to validate once before training. You may check the theoretical validation.
What does this mean? Thanks a lot
I have an error NotFoundError (see above for traceback): Tensor name "truediv_1/ExponentialMovingAverage_1" not found in checkpoint files model_110.ckpt-79999
I've change number of residual blocks to 18 to get 110 layers. It doesn't help
UnrecognizedFlagError Traceback (most recent call last)
in
----> 1 train_dir = 'logs_' + FLAGS.version + '/'
~\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\platform\flags.py in getattr(self, name)
82 # a flag.
83 if not wrapped.is_parsed():
---> 84 wrapped(_sys.argv)
85 return wrapped.getattr(name)
86
~\AppData\Local\Continuum\anaconda3\lib\site-packages\absl\flags_flagvalues.py in call(self, argv, known_only)
631 suggestions = _helpers.get_flag_suggestions(name, list(self))
632 raise _exceptions.UnrecognizedFlagError(
--> 633 name, value, suggestions=suggestions)
634
635 self.mark_as_parsed()
hi, thanks for u code :)
when i run 'cifar10_train.py' found a error:
line 40,"new_variables = tf.get_variable(name, shape=shape, initializer=initializer,
regularizer=regularizer)"
TypeError: init() got multiple values for keyword argument 'dtype'
The batch_normalization_layer() function doesn't compute the statistics of population i.e. population mean and variance. The part implemented is only taking care of the training procedure (batch statistics), but while testing one will need the population statistics
Does anyone test it on cifar100, what is the performance?
Why does the validation error fluctuate so much?
I can change the ‘num_residual_blocks’ if I would like to get a phtot like your Training curve.
If I set ‘num_residual_blocks’=3 this is a 20-resnet ?
If I set ‘num_residual_blocks’=5 this is a 32-resnet ?
If I set ‘num_residual_blocks’=9 this is a 56-resnet ?
If I set ‘num_residual_blocks’=18 this is a 110-resnet ?
It is OK?
I only want to fine-tune the layer of fc, how can i do, thank you
this code is written on python 2.7, libraries like cPickle is not working on python 3.7
Hi. I am using this project as a practice of understanding CNN deeper. Since this model takes 80000 steps to finish training, I was trying to use the uploaded checkpoint of step 79999 to accelerate the training process. However, when I typed the following command
python cifar10_train.py --is_use_ckpt=True --test-ckpt_path='model_110.ckpt-79999'
an error saying "no such file or directory" showed up. What might be the potential problem? Thank you very much.
I got this error when i run python_train.py
Can anyone please tell me how to resolve these errors?
Traceback (most recent call last):
File "cifar10_train.py", line 426, in Model restored from model_110.ckpt-79999
0 batches finished!
10 batches finished!
20 batches finished!
30 batches finished!
40 batches finished!
50 batches finished!
60 batches finished!
70 batches finished!
top1_error, loss = train.test(test_image_array)
ValueError: too many values to unpack
Hello sir:
I run the demo in my database, but i meet so many questions. The top1 error is 0 during trainning, but the validation top1 error is about 0.7. The number of my train dataset is 1300 and validation dataset is 400.Thanks!
working on python3
and have changed cPickle to pickle && data = dicts['data'] to data = dicts.get('data')
I am encountering a problem about
ValueError: all the input arrays must have same number of dimensions
in cifar10_input.py
Traceback (most recent call last):
File "/data/tmp/pycharm_project_979/cifar10_train.py", line 425, in
train.train()
File "/data/tmp/pycharm_project_979/cifar10_train.py", line 86, in train
all_data, all_labels = prepare_train_data(padding_size=FLAGS.padding_size)
File "/data/tmp/pycharm_project_979/cifar10_input.py", line 176, in prepare_train_data
data, label = read_in_all_images(path_list, is_random_label=TRAIN_RANDOM_LABEL)
File "/data/tmp/pycharm_project_979/cifar10_input.py", line 96, in read_in_all_images
data = np.concatenate((data, batch_data))
and when I print (data.shape) it shows (0, 3072), print(batch_data) it shows None
How can I fix the problem?
Hello,I want to know how to train the model with gpu?Now when I excute "python cifar10_train.py" it only uses cpu,tell me how to train the model with gpu.Thank you!
resnet-in-tensorflow/resnet.py
Line 136 in 8ba8d89
Can you please highlight the necessary section in the paper or in the original implementation.
Thank you
why do you choose 40000 as the first step to change lr? it seems that smaller step of changing lr works better.
TRAIN_RANDOM_LABEL = False # Want to use random label for train data? VALI_RANDOM_LABEL = False # Want to use random label for validation?
i successed to train the model in my pc.
but i can't get accuracy of test.
there are no way to test only?
when i checked test(self, test_image_array) method
there are no call.
Hi,
I am new in Resnet. So, I would like to ask how can I put my data in the code. Furthermore, I need to solve a regression task, so, could you give some information about how I can modify the code in order to do this task.
Thank you.
I have been running inference with small number of images and then training; code only runs for one step and then breaks with following error:
step 0, loss = 1.13 (14.0 examples/sec; 0.642 sec/batch)
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py', wdir='C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master')
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Fariha/Desktop/MS/CervCancer/tensorflow-resnet-master/wth.py", line 76, in
image_tensor = sess.run(error)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 984, in _run
self._graph, fetches, feed_dict_string, feed_handles=feed_handles)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 410, in init
self._fetch_mapper = _FetchMapper.for_fetch(fetches)
File "C:\Users\Fariha\Anaconda3\envs\py36\lib\site-packages\tensorflow\python\client\session.py", line 227, in for_fetch
(fetch, type(fetch)))
TypeError: Fetch argument None has invalid type <class 'NoneType'>
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.CancelledError'>, Run call was cancelled
Hi @wenxinxu ,
Was the model model_110.ckpt-79999 fine tune from others like caffe model or totally retrain from cifar10 dataset?
Thanks
Hi, I want to apply your resnet on my dataset. I have created the dataset similar Cifar10, Binay format. Can anyone help me to use my dataset instead of cifar10 train data?
` fo = open(path, 'rb')
dicts = pickle.load(fo)
fo.close()
data = dicts['data']
if is_random_label is False:
label = np.array(dicts[b'labels'])
else:
labels = np.random.randint(low=0, high=10, size=10000)
label = np.array(labels)
return data, label`
2.当修改成dicts = pickle.load(fo,encoding='bytes')程序可以继续运行,但是在data = dicts['data']报错:KeyError: 'data'。当我查看dicts.key()后,我发现结果是:dict_keys([b'data', b'labels', b'batch_label', b'filenames']),为什么每个键的前面会出现字母b?
like this,when i run the code in terminal ,.
it just get
Train top1 error =
Validation top1 error = 0.4200
Validation loss =
so why that train top1 error and validation loss is no output
thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.