vicwer / c3ae_age_estimation Goto Github PK
View Code? Open in Web Editor NEWExploring the Limits of Compact Model for Age Estimation. CVPR2019
License: Apache License 2.0
Exploring the Limits of Compact Model for Age Estimation. CVPR2019
License: Apache License 2.0
In file"models/network.py"
Line:69 # avg5 = tf.reduce_mean(conv5, [1, 2], keepdims=True, name='avg_pool')
Line:70 # print(avg5.name, avg5.get_shape())
Line:71 concat = reshape(conv5, [-1, avg5.get_shape()[3] * 3], 'reshape')
why do you comment Line:69?If so Line:71 will not get avg5.And I remove the "#" of Line:69,but it still doesn't work.
The error msg is "InvalidArgumentError (see above for traceback): Incompatible shapes: [60] vs. [960] [[node tower_0/loss_0/sub (defined at ../models/losses.py:14"
My img size is 2202203 when I produce tf_records.Thank you very much!
It seems that the loss function is a weighted sum of KL Divergence and L1 loss. I'm just wondering why cross-entropy was not used as a loss function instead. Can you pls explain?
请问第一个全连接层是取决于age vector如何分段的吗
Can somebody explain the feat layer? How you able to get the same no.of parameter count as mentioned in the paper i.e. 6156?
@vicwer
thanks!!
(3, 64, 64, 3)
conv_1 (?, 62, 62, 32)
avg_1 (?, 31, 31, 32)
conv_2 (?, 29, 29, 32)
avg_2 (?, 14, 14, 32)
conv_3 (?, 12, 12, 32)
avg_3 (?, 6, 6, 32)
conv_4 (?, 4, 4, 32)
tower_0/C3AE/conv_5/truediv:0 (?, 4, 4, 32)
tower_0/C3AE/avg_pool:0 (?, 1, 1, 32)
reshape (?, 96)
fc_1 (?, 12)
fc_2 (?, 1)
Tensor("tower_0/C3AE/fc_1/kernel/Regularizer/l1_regularizer:0", shape=(), dtype=float32, device=/device:GPU:0)
/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2019-08-28 20:05:02.250645: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-28 20:05:02.912984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:04:00.0
totalMemory: 11.91GiB freeMemory: 11.75GiB
2019-08-28 20:05:02.913034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2019-08-28 20:05:03.342056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-28 20:05:03.342115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2019-08-28 20:05:03.342126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2019-08-28 20:05:03.342558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11366 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:04:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 168507 values, but the requested shape has 145200
[[Node: Reshape = Reshape[T=DT_UINT8, Tshape=DT_INT32, _class=["loc:@random_flip_left_right/Switch_1"]](DecodeRaw, Reshape/shape)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3,64,64,3], [?,1], [?,12]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: IteratorGetNext/_1 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_32_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "multi_gpus_train.py", line 144, in
train(finetune=False)
File "multi_gpus_train.py", line 136, in train
, loss, lr_ = sess.run([train_op, current_loss, learning_rate])
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 168507 values, but the requested shape has 145200
[[Node: Reshape = Reshape[T=DT_UINT8, Tshape=DT_INT32, _class=["loc:@random_flip_left_right/Switch_1"]](DecodeRaw, Reshape/shape)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3,64,64,3], [?,1], [?,12]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: IteratorGetNext/_1 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_32_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Caused by op 'Reshape', defined at:
File "multi_gpus_train.py", line 144, in
train(finetune=False)
File "multi_gpus_train.py", line 76, in train
imgs = tf.reshape(imgs, (-1, imgs.get_shape()[2], imgs.get_shape()[3], imgs.get_shape()[4]))
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6199, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/huhu/anaconda3/envs/tf12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 168507 values, but the requested shape has 145200
[[Node: Reshape = Reshape[T=DT_UINT8, Tshape=DT_INT32, _class=["loc:@random_flip_left_right/Switch_1"]](DecodeRaw, Reshape/shape)]]
[[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,3,64,64,3], [?,1], [?,12]], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[Node: IteratorGetNext/_1 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_32_IteratorGetNext", tensor_type=DT_FLOAT, `_device="/job:localhost/replica:0/task:0/device:GPU:0"]]
I have try it in wiki and imdb, the result is far from expection. Can anyone reproduce it?
I notice in the paper that different scales of image cropped from the face are used.But how can they share the CNN network as they have different scale ?
@vicwer
(3, 64, 64, 3)
conv_1 (?, 62, 62, 32)
avg_1 (?, 31, 31, 32)
conv_2 (?, 29, 29, 32)
avg_2 (?, 14, 14, 32)
conv_3 (?, 12, 12, 32)
avg_3 (?, 6, 6, 32)
conv_4 (?, 4, 4, 32)
reshape (?, 96)
fc_1 (?, 12)
fc_2 (?, 1)
Tensor("tower_0/C3AE/fc_1/kernel/Regularizer/l1_regularizer:0", shape=(), dtype=float32, device=/device:GPU:0)
l1_loss.get_shape(): (?,)
Traceback (most recent call last):
File "multi_gpus_train.py", line 145, in
train(finetune=False)
File "multi_gpus_train.py", line 97, in train
loss = model.compute_loss()
File "/home/zzjh08/C3AE/models/run_net.py", line 27, in compute_loss
loss_l1 = l1_loss(self.pred, self.age_labels)
File "/home/zzjh08/C3AE/models/losses.py", line 14, in l1_loss
_, k_idx = tf.nn.top_k(l1_loss, tf.cast(tf.reduce_prod(l1_loss.get_shape()) * cfg.ohem_ratio, tf.int32))
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1543, in reduce_prod
reduction_indices),
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1275, in _ReductionDims
return range(0, array_ops.rank(x))
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 368, in rank
return rank_internal(input, name, optimize=True)
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 388, in rank_internal
input_tensor = ops.convert_to_tensor(input)
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1050, in convert_to_tensor
as_ref=False)
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/zzjh08/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 250, in _tensor_shape_tensor_conversion_function
"Cannot convert a partially known TensorShape to a Tensor: %s" % s)
ValueError: Cannot convert a partially known TensorShape to a Tensor: (?,)
why my loss is nan? 10 : nan lr: 0.001
Hi, @vicwer. I'm very confused about this sentence in the paper:' a fully connected layer with semantic distribution is inserted in between feature layer yn and the regression layer yn'. Does it mean that there are three fc layers or other changes in the ori plain model? Expect your answer.thx a lot.
the end of the code (multi_gpus_train.py) where we save as ckpt has parameter: cfg.train.max_batches and cfg.train.num_samples.i guess num_samples is the number of our dataset ,right? and still don't understand the usage of max_batches,shall you explain this for me ,thank you very much.
some of model.feats's elements is negative number, because the fc's weights_initializer is between -limit to limit ,so the kl_loss (tf.log occur nan) . blow is my result:
model.feats:[[-0.17209125 0.07770187 -0.1429439 -0.94504386 0.09144928 0.40108332
0.06493281 -0.5548259 0.21889538 0.43835896 0.28741086]] (1, 11)
kl_loss:[nan].
because there Cannot be negative in tf.log,
In file "prepare_data/gen_data_bathch.py"
Line 15: 'age_vector': tf.FixedLenFeature([12], tf.float32),
The last ","should be ","
I want Morph,FGNet datasets to run C3AE
Can you provide a pretrained model? Thx a lot.
运行multi_gpus_train.py时出现错误:
AttributeError:'EasyDict' object has no attribute 'use_se_module'
作者,您好,请问您在训练数据时,有没有采用MTCNN 或者 Retinaface对人脸数据进行对齐裁剪呢?
期待您的回复,谢谢。
age_vector = float(line.strip().split(' ')[2:])
TypeError: float() argument must be a string or a number, not 'list'
line.strip().split(' ')[2:]生成是一个list,直接对list进行强制类型转换不是会出错吗?
为什么你们好像都可以直接运行成功呢,是我的train.txt有错误吗
I don't undertand age_Yn_vector in label, could tell me what it is?
The output of fc1 can be negative while traning. And in KL loss computing procedure, negative value is not valid. So I wonder how will you explain or modify it.
|->img_list/img_list.txt
|->train_list/train.txt
|->dataset/*.png
In the paper, three granularity levels images of one face were crop.What is the exact size of the three images? @vicwer
Hello.Are you the author of C3AE, or do you just reproduce their code according to this paper.
I am confused about some places in the article.In the article, it is said that "Our results on with/without pretrained process are very similar. "
In fact, I had a poor result in training on imdb-wiki and then on MORPH2.If I only train on MORPH2, the average MAE on test dataset is difficult to reach 2.75.The average MAE of mine on test dataset is nearly 3.7.
I really want to know the reason, because there are many papers that are difficult to reproduce the original results, whether it is my wrong operation.Thank you very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.