Comments (16)
Hello~ I have tested it on duke->market.
# case 1: origin
cams = pid_cam[pid_i]
index = pid_index[pid_i]
select_cams = No_index(cams, i_cam)
Mean AP: 77.7%
CMC Scores:
top-1 90.8%
top-5 96.6%
top-10 98.0%
# case 2: no select cams
cams = pid_cam[pid_i]
index = pid_index[pid_i]
select_cams = []
Mean AP: 76.5%
CMC Scores:
top-1 89.5%
top-5 96.0%
top-10 97.3%
# case 3: no select cams wrong
cams = pid_cam[pid_i]
index = pid_index[pid_i]
select_cams = list(range(len(cams)))
Mean AP: 8.6%
CMC Scores:
top-1 17.2%
top-5 31.7%
top-10 39.6%
Conclusion:
- "remove the index of i from sample list, which means that you may use the image of index i twice in a mini-batch." seems crucial.
- sampling data according to their camera IDs may bring some improvements (~1%)
from openunreid.
select_cam
targets at sampling images from different cameras for a priority. And the values in select_cams
are actually the instance indexes instead of camera indexes. So if you use select_cams = list(range(len(cams)))
, you will always use data samples with the first len(cams)
indexes, i.e. the first 6 images, which is totally wrong.
See https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py#L17, the values in the return list is the index i
instead of original value j
.
from openunreid.
Thanks for your reply!
An example:
select_cams = No_index(cams, i_cam)
print(len(cams), select_cams)
select_cams = list(range(len(cams)))
print(len(cams), select_cams)
21 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
21 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
I think they are similar?
from openunreid.
If you do not want to sample data by their cameras, just simply remove Line 104-118 in https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py
from openunreid.
Thanks for your reply!
An example:select_cams = No_index(cams, i_cam) print(len(cams), select_cams) select_cams = list(range(len(cams))) print(len(cams), select_cams)
21 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 21 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
I think they are similar?
If you use select_cams = list(range(len(cams)))
, for example, here you have 21 cameras, you will always sample your data from the first 21 images of each class.
Two situations may worse your performance:
- In some classes, there are more than 21 images, but the remaining (21~) will not be used all the time;
- You did not remove the index of
i
from your sample list, which means that you may use the image of indexi
twice in a mini-batch.
from openunreid.
If the result of list(range(len(cams)))
means 21 cameras, select_cams = No_index(cams, i_cam)
also means the camera indexes? Because they are similar?
You did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
Thanks for point this.
from openunreid.
select_cams = list(range(len(cams)))
means the instance indexes, we can use this to get index from the same ID.
The problem may be
You did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
But the performance seems wired. I rewrite the log dir, so I can't find the result
from openunreid.
If the result of
list(range(len(cams)))
means 21 cameras,select_cams = No_index(cams, i_cam)
also means the camera indexes? Because they are similar?You did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
Thanks for point this.
Please see https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py#L17 carefully, the values in the return list is the index i
(indexes of images within the same pid) instead of original value j (indexes of images' camera IDs within the same pid).
from openunreid.
select_cams = list(range(len(cams)))
means the instance indexes, we can use this to get index from the same ID.
The problem may beYou did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
But the performance seems wired. I rewrite the log dir, so I can't find the result
I have mentioned: For example, here you have 21 cameras, you will always sample your data from the first 21 images of each class. In some classes, there are more than 21 images, but the remaining (21~) will not be used all the time. This is the core problem.
Use list(range(len(cams)))
is totally wrong. If you do not want to sample data by their cameras, just simply remove Line 104-118 in https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py
from openunreid.
If the result of
list(range(len(cams)))
means 21 cameras,select_cams = No_index(cams, i_cam)
also means the camera indexes? Because they are similar?You did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
Thanks for point this.Please see https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py#L17 carefully, the values in the return list is the index
i
(indexes of images within the same pid) instead of original value j (indexes of images' camera IDs within the same pid).
select_cams = list(range(len(cams)))
alse means (indexes of images within the same pid)
An example:
datasets: {'market1501': 'trainval', 'dukemtmcreid': 'trainval'}
unsup_dataset_indexes: [0,]
cams = pid_cam[pid_i]
index = pid_index[pid_i]
print("-" * 16)
print(cams) # [5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7]
print(index) # [15014, 15015, 15016, 15017, 15018, 15019, 15020, 15021, 15022, 15023, 15024, 15025, 15026]
select_cams = No_index(cams, i_cam)
print(len(cams), select_cams) # 13 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
select_cams = list(range(len(cams)))
print(len(cams), select_cams) # 13 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
print("-" * 16)
[5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7]
[15014, 15015, 15016, 15017, 15018, 15019, 15020, 15021, 15022, 15023, 15024, 15025, 15026]
13 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
13 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
select_cams = No_index(cams, i_cam)
给的是index
的下标,也就是[15014, 15015, 15016, 15017, 15018, 15019, 15020, 15021, 15022, 15023, 15024, 15025, 15026]中的第几个,那list(range(len(cams)))
给的也是index
的下标,因为index
和cams
是一一对应的?
from openunreid.
What I have mentioned is that, the core problem that worse the final performance is due to the fact that
The number of cameras is fixed, e.g. 13 in your example.
So, if you use list(range(len(cams)))
as the candidate list, you could only sample the images from the first list(range(len(cams)))
indexes. For example, if there are 20 images in class A, you could only sample the mini-batch from the first 13 images, while the other 7 images would not be considered.
from openunreid.
What I have mentioned is that, the core problem that worse the final performance is due to the fact that
The number of cameras is fixed, e.g. 13 in your example.
So, if you uselist(range(len(cams)))
as the candidate list, you could only sample the images from the firstlist(range(len(cams)))
indexes. For example, if there are 20 images in class A, you could only sample the mini-batch from the first 13 images, while the other 7 images would not be considered.
关键是总共只有13张图像,因为index的长度只有13?
from openunreid.
What I have mentioned is that, the core problem that worse the final performance is due to the fact that
The number of cameras is fixed, e.g. 13 in your example.
So, if you uselist(range(len(cams)))
as the candidate list, you could only sample the images from the firstlist(range(len(cams)))
indexes. For example, if there are 20 images in class A, you could only sample the mini-batch from the first 13 images, while the other 7 images would not be considered.
并且我用的是market和duke数据集,cam不可能有13的
from openunreid.
Ok, sorry for misunderstanding the cams
in list(range(len(cams)))
. I thought it was the overall camera IDs, and I noticed it is a variable in the code showing the same length with images.
So the conclusion is that sampling data according to their camera IDs is crucial to the final performance.
from openunreid.
嗯嗯,也有可能是因为
You did not remove the index of i from your sample list, which means that you may use the image of index i twice in a mini-batch.
我在尝试
just simply remove Line 104-118 in https://github.com/open-mmlab/OpenUnReID/blob/master/openunreid/data/samplers/distributed_identity_sampler.py
的方案。
感谢您的回复
from openunreid.
Welcome to show your comparison results here when you finish training. I am also curious about how much would it affect.
from openunreid.
Related Issues (20)
- AttributeError: module 'faiss._swigfaiss' has no attribute 'delete_SwigPyIterator' HOT 1
- Difference between OpenUnReID and the original MMT's implementation HOT 2
- Something about the Sampler HOT 1
- Implement of SpCL HOT 1
- spcl+在UDA上不能复现结果
- About Leaderboard. HOT 1
- strong_basline训练到后期acc突然降到0 loss变为nan HOT 3
- Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, HOT 1
- 关于num_parts的问题 HOT 3
- Validation freq HOT 3
- Question about train loader
- Question about 'jaccard distance'
- 计算三元组损失时,特征需要归一化吗,代码里没有做归一化
- 这是什么问题?
- K-means for SpCL
- Loss becomes nan when I try to train MMT for 100 epochs
- severe overfitting with a single GPU HOT 2
- Ask for help for solving faiss assertion error HOT 1
- batchsize
- super strong baseline
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openunreid.