luhaofang / tripletloss Goto Github PK

View Code? Open in Web Editor NEW

323.0 323.0 152.0 483 KB

tripletloss in caffe

License: MIT License

Python 100.00%

tripletloss's Introduction

tripletloss's People

Contributors

Stargazers

Watchers

Forkers

cheer37 morningsky phperwu takecareofbigboss linzhineng kli-casia wanjinchang sunxingxingtf brettll mfzhang george-zhu tybxiaobao ambier wwwanghao onlysang gzzgz runauto iscas-lee wyslatitude gjtjx cenaliu sysuzyq lcj1105 husthulina twinsyssy1018 cmxnono milestonesvn deepxkn xuguozhi darengking yqwang2006 anguliachao yogsin liaoheping stoneyang-face tchernitski llp1992 coocoky ilovecv tianxingyzxq zuoshaobo jimmy-ng htyao89 mornydew runngezhang lexie0212 879229395 lyimage hyzcn yangerkun craftsliu umarchen leo-zhou conanhung abby621 papamadeleine2022 guyov1 fangbinwei yiliangnie pinglmlcv jtn-ms daijucug shshim0513 yengjie2200 kevin35day gracedd ckrunauto tonychouzju aimago giorking cv9527 wangsheng1991 jianweilin likeucode yiweichen04 apprisi xun-yang lemonaha generlist gangqiangzhao stellalee dimplesl techstone hubery94 joefannie slf12 tpys mjchen611 hblv walkoncross amore-hdu giounona s0302102 rentianhua nemocpp remyyang cupwater xuekuanwang ganghu1993 liu-hongwen

tripletloss's Issues

How to use it?

Hello,
how to use your tripletloss python layer?
I 'make pycaffe' and put the folder 'tripletloss' in /usr/lib/python2.7/dist-packages,
but when I start to train with tripletloss-master/example_layer/model/solver.prototxt,
it says:
Creating layer norm2
ImportError: No module named tripletloss.norm2layer
Could give me some advise please?

Hard Sample

Hi ,
I have searched some information about how to train the triplet-loss. They always mentioned a word " hard sample ". Could you explain this concept？ And how to do this?

您好，您这个工程到底怎么用起来啊？有详细的步骤吗？

Question about Anchors

This may be a silly question, but in order to get convergence, do you always need to use the same anchors for the entire training session, or can the anchors be selected at random from within the DataLayer forward pass?

Thanks!

tripletselectlayer.py第58行是不是有问题？

训练到后面 loss都变为0.1 an,ap变为0

@luhaofang 您好，我用您的代码，训练集是casia webface，一共10572类，40多万张图片，因为数据量比较大因此不使用softmax进行预训练，而是直接用triplet loss从头训练。训练到后面所有loss都是0.1，an=0,ap=0，并且使用中间的caffemodel进行测试，所有图片的128维特征向量都一样。。请问这是为什么呢？谢谢！

pos, neg从大到小排序，这里有bug吗

我的理解是应该pos从小到大排序，neg由大到小排序，这样组成的triplet，loss会比较大

或者考虑所有achor，pos，neg的组合，选择loss比较大的进行训练

用你的train、第一步训练softmax时提示错误

Error in 'python': malloc(): memory corruption: 0x00000000000819a220 ***
请问你遇到过这样的情况吗？

请问FaceNet使用的训练集从哪里下载

我没在论文里看到相应描述，请问FaceNet使用的训练集从哪里下载或者用的是什么数据库？谢谢

training image size ??

hi.

training image size is 224x224 (not 256x256) ???
your vgg model has no crop_size(but original vgg face crop_size=224) ??

Selecting semi-hard examples, possible bug?

Hello,
In the tripletselect layer, we aim to choose semi-hard negative examples. meaning, negative examples that are within a margin alpha comparing to the positive example.
In your custom layer, you sort both a_p and a_n examples so that the highest distance is first.
Don't we want the highest a_p distance and the smallest a_n distance so they might violate the semi-hard definition?

The following code with my suggested fix:

` archor_feature = bottom[0].data[0]

    for i in range(self.triplet):

        positive_feature = bottom[0].data[i+self.triplet]
        a_p = archor_feature - positive_feature
        ap = np.dot(a_p,a_p)
        aps[i+self.triplet] = ap
        aps = sorted(aps.items(), key = lambda d: d[1], reverse = True)

    for i in range(self.triplet):

        negative_feature = bottom[0].data[i+self.triplet*2]
        a_n = archor_feature - negative_feature
        an = np.dot(a_n,a_n)
        ans[i+self.triplet*2] = an
    #ans = sorted(ans.items(), key = lambda d: d[1], reverse = True)  # guyn - seems to me like a bug, we want the lowest distance to be first
    ans = sorted(ans.items(), key = lambda d: d[1], reverse = False)`

What do you think?
Guy

training problem

I am trying to train your base code.
but, In the process, we see strange phenomena.
During learning, the values of ap and an are gradually increasing. Do you know what the problem is?
Here is an example.

init step

I0724 11:03:43.994547 9413 solver.cpp:338] Iteration 0, Testing net (#0)
loss: diff: -0.000300745 ap:0.0187428 an:0.0184421
I0724 11:03:44.687252 9413 solver.cpp:406] Test net output #0: loss = 0.0998072 (* 1 = 0.0998072 loss)
loss: diff: -0.0166375 ap:0.204302 an:0.187664
I0724 11:03:45.168706 9413 solver.cpp:229] Iteration 0, loss = 0.101533
I0724 11:03:45.168732 9413 solver.cpp:245] Train net output #0: loss = 0.101533 (* 1 = 0.101533 loss)
I0724 11:03:45.168741 9413 sgd_solver.cpp:106] Iteration 0, lr = 0.05
fc9_1: 0.0270555
loss: diff: 0.00402641 ap:0.187749 an:0.191775
fc9_1: 0.02706
loss: diff: -0.000589401 ap:0.185847 an:0.185258
fc9_1: 0.0270641
loss: diff: 0.00889902 ap:0.18142 an:0.190319
fc9_1: 0.0270755
loss: diff: -0.00702135 ap:0.179443 an:0.172421
fc9_1: 0.0270836
loss: diff: -0.000574797 ap:0.179911 an:0.179336
fc9_1: 0.0270977
loss: diff: -0.00920136 ap:0.198604 an:0.189403
fc9_1: 0.0271129
loss: diff: -0.00556538 ap:0.195801 an:0.190236
fc9_1: 0.0271284
loss: diff: -0.00923073 ap:0.194738 an:0.185507
fc9_1: 0.0271592
loss: diff: 0.00120996 ap:0.199738 an:0.200948
fc9_1: 0.027195
loss: diff: 0.00612946 ap:0.193083 an:0.199212
fc9_1: 0.0272452
loss: diff: -0.00397289 ap:0.195506 an:0.191533
fc9_1: 0.0273063
loss: diff: -0.00386688 ap:0.195402 an:0.191535
fc9_1: 0.0273628
loss: diff: 0.00654422 ap:0.172184 an:0.178728
fc9_1: 0.0273842
loss: diff: -0.00313556 ap:0.187025 an:0.183889
fc9_1: 0.0274232
loss: diff: 0.00579476 ap:0.197591 an:0.203386
fc9_1: 0.0274561
loss: diff: -0.010954 ap:0.201672 an:0.190718
fc9_1: 0.0274747
loss: diff: 0.0195905 ap:0.170718 an:0.190309
fc9_1: 0.0274855
loss: diff: 0.00869015 ap:0.193575 an:0.202265
fc9_1: 0.0275473
loss: diff: -0.0020541 ap:0.196721 an:0.194667
fc9_1: 0.0275993
loss: diff: -0.00750799 ap:0.204833 an:0.197325

3360 step

loss: diff: 303786.0 ap:229735.0 an:533521.0
I0724 11:03:09.143044 511 solver.cpp:229] Iteration 3360, loss = 5285.04
I0724 11:03:09.143069 511 solver.cpp:245] Train net output #0: loss = 5285.04 (* 1 = 5285.04 loss)
I0724 11:03:09.143077 511 sgd_solver.cpp:106] Iteration 3360, lr = 0.05
fc9_1: 49.6616
loss: diff: 374721.0 ap:515413.0 an:890134.0
fc9_1: 49.8801
loss: diff: -581244.0 ap:2.19052e+06 an:1.60928e+06
fc9_1: 49.9985
loss: diff: 547982.0 ap:352190.0 an:900172.0
fc9_1: 50.1899
loss: diff: 393501.0 ap:617288.0 an:1.01079e+06
fc9_1: 50.5875
loss: diff: 501302.0 ap:433316.0 an:934618.0
fc9_1: 50.9914
loss: diff: -466161.0 ap:870104.0 an:403944.0
fc9_1: 51.5431
loss: diff: -457406.0 ap:1.35256e+06 an:895150.0
fc9_1: 51.9892
loss: diff: 60009.7 ap:679217.0 an:739227.0
fc9_1: 52.2999
loss: diff: -328786.0 ap:879514.0 an:550729.0
fc9_1: 52.7494
loss: diff: -152841.0 ap:813207.0 an:660366.0
fc9_1: 53.1168
loss: diff: -457695.0 ap:1.29021e+06 an:832511.0
fc9_1: 53.2979
loss: diff: 171481.0 ap:1.07467e+06 an:1.24615e+06
fc9_1: 53.524
loss: diff: -172325.0 ap:787410.0 an:615085.0
fc9_1: 53.811
loss: diff: -35747.0 ap:1.9394e+06 an:1.90365e+06
fc9_1: 54.0079
loss: diff: 113833.0 ap:421736.0 an:535569.0
fc9_1: 54.3732
loss: diff: 241538.0 ap:1.02858e+06 an:1.27012e+06
fc9_1: 54.7839
loss: diff: -186023.0 ap:930481.0 an:744458.0
fc9_1: 55.2063
loss: diff: 616794.0 ap:955952.0 an:1.57275e+06
fc9_1: 55.2271
loss: diff: -755828.0 ap:1.74845e+06 an:992624.0
fc9_1: 55.2024
loss: diff: -33648.6 ap:638003.0 an:604355.0

triplet_select

在 triplet_select.py中,
先按照距离对anchor, pos, neg由大到小排序，而代码：
if aps[i][1] >= ans[i][1]:
self.no_residual_list.append(i)
self.tripletlist.append([i,aps[i][0],ans[i][0]])
的意思是每次都取10对triplet了吗？还是其他的选择方案？

叨扰了，再请教一个问题

@pinguo-luhaofang
如果我更改norm2layer前的那层全连接层的输出个数（即与softmax loss训练时的输出个数不同，我把3000多改成了10），发现ap、an均变得很大，有约1e2的数量级，这是什么原因？norm2layer不是已经做了归一化么？以及我这样改动科学么？。。
谢谢！

the triplet loss architecture is unsupervised, is this correct?

hi, I am sorry this question may be more conceptual than technical.

When you use the triplet loss architecture, there are no labels. Is this correct?

You just compute the distance between the anchor subset of the batch and the positive subset of the batch, and the distance between the anchor subset of the batch and the negative subset of the batch; and then the resulting loss.

Of course, the underlying very strong premise is that within each batch, the first third is composed of anchor images, the second third of positive images, and the third third of negative images.

Is this correct?

One more question if I may.

In deploy mode, the batch size can be 1 right? Backward propagation through training makes sure that in deploy mode the weights will create a single embedding (batch size 1) from "fc9_1" which is close to embeddings of pictures belonging to the same person.

Is this correct?

Thanks so much.

train_va.txt的格式

想问下train_va.txt的格式是什么，是图片路径@图片名吗？
我出来这个问题，
ValueError: invalid literal for int() with base 10: '/home/caffe/tripletloss/lfw/Mick_Jagger@0Mick_Jagger_0001.jpg'

Why norm2layer.py and tripletselectlayer.py don't have backward methods?

请教一个浅显的问题

@pinguo-luhaofang 请问triplet loss训练好的模型是不是不能直接拿来做分类？而是测试的时候只能输入两个样本图片，比较他们输出feature之间的距离，用某个阈值判定他们是不是同一个人？
谢谢！

how to build this

Hi, I don't know how to build this layer, which path in caffe i put this code in.Can you give the step for add this layer?
Thanks!

Data Layer Creating Problem

Hi, there is a problem when creating data layer. Could you please tell me how to fix it?
I0614 14:01:23.216370 9957 layer_factory.hpp:77] Creating layer data
I0614 14:01:23.312580 9957 net.cpp:91] Creating Layer data
I0614 14:01:23.312675 9957 net.cpp:399] data -> data
I0614 14:01:23.312741 9957 net.cpp:399] data -> labels
Traceback (most recent call last):
File "train.py", line 89, in
sw = SolverWrapper(solver_prototxt, output_dir,pretrained_model)
File "train.py", line 30, in init
self.solver = caffe.SGDSolver(solver_prototxt)
File "/data/zhaoxin/caffe/tripletloss/tripletloss/tripletloss/datalayer.py", line 82, in setup
layer_params = yaml.load(self.param_str_)
AttributeError: 'DataLayer' object has no attribute 'param_str_'

No module named norm2layer

为什么我训练好了模型后，在测试一张图片时会出现No module named norm2layer，这个原因是我没有配置好caffe吗？但是我在训练模型的时候是没问题的，并且单独使用softmax测试图片也是没问题的，为什么就会在训练好triplet模型后，去测试一张图片的时候出现这种情况，求赐教，谢谢了！不胜感激！ @pinguo-luhaofang

lfw test

请问你的triplet模型在lfw上测试了准确度么~

AttributeError: 'module' object has no attribute 'text_format'

self.solver_param = caffe_pb2.SolverParameter()
with open(solver_prototxt, 'rt') as f:
pb2.text_format.Merge(f.read(), self.solver_param)
程序编译到这一步出错，报AttributeError: 'module' object has no attribute 'text_format'错误，请问是什么原因啊？？？

没有hard positive和hard negative会不会很慢？

看代码里没有实现hard positive和hard negative

training problem

Hi,
I am trying to train your base code with Market-1501. And I have removed the fc-9 which mentioned in #47 . But the output about the ap and an is not distinguished .

Like this:
loss: ap:1.04867 an:1.04086
niter:: 17601 Loss : 0.110664203763 acc_res : 0.0
loss: ap:1.03278 an:1.00164
niter:: 17602 Loss : 0.11082803458 acc_res : 0.0
loss: ap:1.05027 an:1.04821
niter:: 17603 Loss : 0.102735437453 acc_res : 0.0
loss: ap:1.05265 an:1.06449
niter:: 17604 Loss : 0.0914521738887 acc_res : 0.0
loss: ap:1.03527 an:1.01179
niter:: 17605 Loss : 0.113486751914 acc_res : 0.0
loss: ap:1.05145 an:1.0433
niter:: 17606 Loss : 0.113084629178 acc_res : 0.0

and
loss: ap:1.06208 an:1.08265
niter:: 99993 Loss : 0.0485837906599 acc_res : 0.0
loss: ap:1.10919 an:1.04391
niter:: 99994 Loss : 0.0977101102471 acc_res : 0.0
loss: ap:1.06604 an:1.15973
niter:: 99995 Loss : 0.083872847259 acc_res : 0.0
loss: ap:1.06359 an:0.966018
niter:: 99996 Loss : 0.0970649272203 acc_res : 0.0
loss: ap:1.07257 an:1.01377
niter:: 99997 Loss : 0.10638307035 acc_res : 0.0
loss: ap:1.02597 an:1.08038

Could you give me some suggests? Thank you. @luhaofang @JoeFannie @Johere

loss居高不下

你好！我有485个人脸图，每个人80张。我将train.prototxt 以及solver中的参数改了很多次，在softmax训练时loss一直在3到8直接徘徊，这个有木有解决办法！是数据量太少了？还有就是人脸必须先做矫正吗？

gradient formulation for unit normalized Image embedding

bottom[0].diff[i] = self.a*((x_n - x_p)/((bottom[0]).num))
bottom[1].diff[i] = self.a*((x_p - x_a)/((bottom[0]).num))
bottom[2].diff[i] = self.a*((x_a - x_n)/((bottom[0]).num))

If we differentiate, loss function with respect to x_a,x_p,x_n, then, the gradients should be:
(If we are assuming embedding of an image as unit normalized)

bottom[0].diff[i] = self.a*((x_n - x_p)/((bottom[0]).num))
bottom[1].diff[i] = self.a*((- x_a)/((bottom[0]).num))
bottom[2].diff[i] = self.a*((x_a)/((bottom[0]).num))

Please correct me, if I am wrong...

tripletloss 改善了几个点？

请问大家，增加这个tripletloss的训练后，能提升几个点？为什么我加上之后好像没有什么提升？

Python格式空格和TAB混用

你好，感谢提供tripletloss代码。

发现代码中缩进有tab和空格混用，在ipython-notebook中无法运行。

想问一下大家的batch_size都设多大？

我用980Ti（6G显存），batch_size设为18，显存已经占用了将近6G。。。0.0为什么会这样？

Hi，谢谢你的triplet网络。请问我准备训练一个112*112大小的vggface 先训练softmax分类再finetune triplet是否需要更改网络结构

另，请问能否共享下你在997个人上训练softmax分类的solver和网络初始化文件？
请问你的997个人数据均衡吗？每个人有多少张图片
我准备在webface上训练我做了处理 10张以下的人过滤掉现在有10010个人 40W张图片
目前的尝试都不收敛。。有没有什么建议？谢谢

Hi, What have you extract feature?

I completed the training.
but i wonder extracted feature name on your deploy.prototxt.
"fc9_1(512)" or fc7(4096) or others ???

thank you.

layer_params = yaml.load(self.param_str_) AttributeError: 'DataLayer' object has no attribute 'param_str_'

您好，今天编译出了点问题，我看了您跟另一位的对话，说是python环境问题，可是我export过去了啊...

echo $PYTHONPATH
:/home/sist312/caffe-master/python:/home/chuanjun/Documents/tripletloss-master/tripletloss:/home/chuanjun/Documents/caffe/python

本机的caffe和实验室公用的caffe我都export了，为什么还是不行啊...

Hi，我想问下您的997个人的数据每个人样本有多少样本数差距大么另外样本里面侧脸多吗？

如题，

deploy.prototxt question

Why does the 'Softmax' layer have a bottom linked to 'norm2' instead of 'fc9_1'? Is this correct?

bug报告

1 datalayer.py和 tripletlosslayer.py中
param_str_
是否应为
param_str

2 train.prototxt中
layer {
name: "accuracy_1"
type: "Accuracy"
bottom: "norm2"
bottom: "labels"
top: "accuracy"
include: { phase: TEST }
}
layer {
name: "loss_cls"
type: "SoftmaxWithLoss"
bottom: "norm2"
bottom: "labels"
top: "loss_cls"
loss_weight: 1
}
此处的norm2是否应为fc9

tripletselectlayer possible bug?

Hi, thanks for the code.
In tripletselectlayer.py line 71-73:

bottom[0].diff[self.tripletlist[i][0]] = top[0].diff[i]
bottom[0].diff[self.tripletlist[i][1]] = top[1].diff[i]
bottom[0].diff[self.tripletlist[i][2]] = top[2].diff[i]

Since an image can be selected twice in self.tripletlist (e.g. [[0,10,100], [1,100,8]...] here image id 100 appears as the hard negative sample for anchor 0, but hard positive sample for anchor 1), I suggest changing to code line 71-73 to:

bottom[0].diff[self.tripletlist[i][0]] += top[0].diff[i]
bottom[0].diff[self.tripletlist[i][1]] += top[1].diff[i]
bottom[0].diff[self.tripletlist[i][2]] += top[2].diff[i]

Do u agree?

遇到一个很奇怪的错误，请大家看一下

I0817 04:13:15.119854 3010 layer_factory.hpp:74] Creating layer data
I0817 04:13:15.155990 3010 net.cpp:90] Creating Layer data
I0817 04:13:15.156023 3010 net.cpp:368] data -> data
I0817 04:13:15.156049 3010 net.cpp:368] data -> labels
I0817 04:13:15.156061 3010 net.cpp:120] Setting up data
100
I0817 04:13:15.164710 3010 net.cpp:127] Top shape: 30 3 224 224 (4515840)
I0817 04:13:15.164723 3010 net.cpp:127] Top shape: 30 (30)
I0817 04:13:15.164732 3010 layer_factory.hpp:74] Creating layer conv1_1
I0817 04:13:15.164748 3010 net.cpp:90] Creating Layer conv1_1
I0817 04:13:15.164757 3010 net.cpp:410] conv1_1 <- data
I0817 04:13:15.164768 3010 net.cpp:368] conv1_1 -> conv1_1
I0817 04:13:15.164782 3010 net.cpp:120] Setting up conv1_1
I0817 04:13:15.213511 3010 net.cpp:127] Top shape: 30 64 224 224 (96337920)
段错误 (核心已转储)

这个段错误是什么东西？

tipletselectlayer - computing the distance against the anchor image

When you calculate the dot product between the anchor feature and all positive features, you are calculating the product between the first image of anchor layer (total 10 images, using a batch of 30 as in your example) and all images of the positive layer.

archor_feature = bottom[0].data[0]
        for i in range(self.triplet):
            positive_feature = bottom[0].data[i+self.triplet]
            a_p = archor_feature - positive_feature
            ap = np.dot(a_p,a_p)
            aps[i+self.triplet] = ap

You repeat a similar procedure between the first image in the anchor layer and all images in the negative layer:

for i in range(self.triplet):
            negative_feature = bottom[0].data[i+self.triplet*2]
            a_n = archor_feature - negative_feature
            an = np.dot(a_n,a_n)
            ans[i+self.triplet*2] = an

I am not sure if there is a away around it, but what are all the other images in the anchor layer used for? (in your example 9 images, from image 1 to image 9)? nothing?

I0605 20:26:05.052970 10307 layer_factory.hpp:77] Creating layer triplet_select
I0605 20:26:05.668673 10307 net.cpp:84] Creating Layer triplet_select
I0605 20:26:05.668705 10307 net.cpp:406] triplet_select <- pool10
I0605 20:26:05.668715 10307 net.cpp:406] triplet_select <- label
I0605 20:26:05.668722 10307 net.cpp:380] triplet_select -> archor
I0605 20:26:05.668730 10307 net.cpp:380] triplet_select -> positive
I0605 20:26:05.668736 10307 net.cpp:380] triplet_select -> negative
I0605 20:26:05.668946 10307 net.cpp:122] Setting up triplet_select
I0605 20:26:05.668963 10307 net.cpp:129] Top shape: 10 34 (340)
I0605 20:26:05.668968 10307 net.cpp:129] Top shape: 10 34 (340)
I0605 20:26:05.668972 10307 net.cpp:129] Top shape: 10 34 (340)
I0605 20:26:05.668975 10307 net.cpp:137] Memory required for data: 80587760
I0605 20:26:05.668980 10307 layer_factory.hpp:77] Creating layer tripletloss
I0605 20:26:05.669806 10307 net.cpp:84] Creating Layer tripletloss
I0605 20:26:05.669819 10307 net.cpp:406] tripletloss <- archor
I0605 20:26:05.669824 10307 net.cpp:406] tripletloss <- positive
I0605 20:26:05.669828 10307 net.cpp:406] tripletloss <- negative
I0605 20:26:05.669833 10307 net.cpp:380] tripletloss -> loss
terminate called after throwing an instance of 'boost::python::error_already_set'
*** Aborted at 1496665565 (unix time) try "date -d @1496665565" if you are using GNU date ***
PC: @     0x7f1c49afec37 (unknown)

and I didn't change the code below:

#add activation layer may increase the feature's expression
layer {
  name: "triplet_select"
  type: "Python"
  bottom: "fc9_1"
  bottom: "labels"
  top: "archor"
  top: "positive"
  top: "negative"
  python_param {
    module: "tripletselectlayer"
    layer: "TripletSelectLayer"
  }
}
layer {
  name: "tripletloss"
  type: "Python"
  bottom: "archor"
  bottom: "positive"
  bottom: "negative"
  top: "loss"
  python_param {
    module: "tripletlosslayer"
    layer: "TripletLayer"
    param_str: "'margin': 0.2"
  }
  loss_weight: 1
}

Hi 请问你放出来的模型是自己从头开始训练的还是在官方的模型基础上finetune的？

请问你的数据里面包含官方的vgg数据吗？