I'm using a pretrained osnet_ain model to predict Re-ID embeddings of a specific size

Output Shapes of Model During Training and Evaluation Differs about deep-person-reid HOT 1 CLOSED

bmmtstb commented on May 24, 2024

Output Shapes of Model During Training and Evaluation Differs

from deep-person-reid.

Comments (1)

bmmtstb commented on May 24, 2024

It took a while, but I think I understood now. I write a few notes, if someone else stumbles upon the same issues, and close this issue.

Looking at the engine, e.g. the ImageSoftmaxEngine during the training process in forward_backward() the loss is computed between the pids and the models prediction.

deep-person-reid/torchreid/engine/image/softmax.py

Line 86 in 566a56a

loss = self.compute_loss(self.criterion, outputs, pids)

In this case the model prediction is a tensor of shape [(B,) num_classes] containing an estimate of a certainty for every class. (This is not really a probability, because it wasn't passed through a Softmax, but the analogy holds.) Still, by passing it to the loss function, the last linear layer will learn to predict the class from the embeddings. And because during training there are no "real" embeddings to compute a loss against, it is important to predict the classes to be able to compare those to the targets.

During testing, we assume that our model predicts useful embeddings and therefore we can use them. Lets have a look at engine._evaluate, when calling _feature_extraction():

deep-person-reid/torchreid/engine/engine.py

Lines 379 to 385 in 566a56a

    
           print('Extracting features from query set ...') 
        
           qf, q_pids, q_camids = _feature_extraction(query_loader) 
        
           print('Done, obtained {}-by-{} matrix'.format(qf.size(0), qf.size(1))) 
        
           print('Extracting features from gallery set ...') 
        
           gf, g_pids, g_camids = _feature_extraction(gallery_loader) 
        
           print('Done, obtained {}-by-{} matrix'.format(gf.size(0), gf.size(1)))

The same embedding generation model will predict the embeddings (qf & gf) for the query and gallery. Then we can validate, whether the embeddings for person X in the query are close to the embeddings for person X in the gallery. This is done by computing the distance between every embedding in the query to every embedding in the gallery. Smaller distances meaning that the embeddings are closer, and therefore it is more likely, that the same person is shown. If the embeddings for each individual are close, and still far away from every other individual, the goal is reached.

Looking at the evaluation (here the python version of eval_market1501), we first obtain the indices which sort the distance matrix (smallest to highest).

deep-person-reid/torchreid/metrics/rank.py

Lines 107 to 108 in 566a56a

    
           indices = np.argsort(distmat, axis=1) 
        
           matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)

Therefore, with g_pids[indices] we obtain the for every person id in the gallery a sorted list of the closest matching person ids from query. If we now take the first max_rank values of the matches, which contain the most likely classes, we can check if those contain the correct target id.

So, Yes, it kind of makes sense to have different outputs. But in my opinion it would be better to always return the embeddings and a probability distribution (e.g. Softmax) of the certainty that this embedding relates to a specific class. Then there wouldn't have to be a difference between training and testing...

from deep-person-reid.

Output Shapes of Model During Training and Evaluation Differs about deep-person-reid HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	print('Extracting features from query set ...')
	qf, q_pids, q_camids = _feature_extraction(query_loader)
	print('Done, obtained {}-by-{} matrix'.format(qf.size(0), qf.size(1)))

	print('Extracting features from gallery set ...')
	gf, g_pids, g_camids = _feature_extraction(gallery_loader)
	print('Done, obtained {}-by-{} matrix'.format(gf.size(0), gf.size(1)))

	indices = np.argsort(distmat, axis=1)
	matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)