First of all, Thank you for your interesting work. But I have some question about some

Thank you. It is a interesting phenomenon that view ()</code

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Questions about the model about social-stgcnn HOT 8 CLOSED

abduallahmohamed commented on July 17, 2024 3

Questions about the model

from social-stgcnn.

Comments (8)

abduallahmohamed commented on July 17, 2024

Hi,

For 1.
We used the relative position because there's no global reference system across different datasets. This approach of using the delta is common and you can notice it in previous work such as Social-LSTM/GAN. It's kinda of normalization technique for the positions. Then, the adjacency matrix A holds the similarity between the position(relative ). I hope this clarifies your confusion.

For 2.
I believe you are correct in using permute instead of view.
I developed the following test:

v = torch.zeros(1,5,3,2) # batch,features,time,pedestrian number
v[:,:,0,:] = 10  #Mark first time step 
v[:,:,1,:] = 20  #Mark second time step 
v[:,:,2,:] = 30 #Mark thirdtime step  


v[:,:,:,0] += 1  #Identify time step relative to first pedestrian 
v[:,:,:,1] += 2  #Identify time step relative to second pedestrian 

print(v)

tensor([[[[11., 12.],
          [21., 22.],
          [31., 32.]],

         [[11., 12.],
          [21., 22.],
          [31., 32.]],

         [[11., 12.],
          [21., 22.],
          [31., 32.]],

         [[11., 12.],
          [21., 22.],
          [31., 32.]],

         [[11., 12.],
          [21., 22.],
          [31., 32.]]]])

Using view:

v.view(v.shape[0],v.shape[2],v.shape[1],v.shape[3])
tensor([[[[11., 12.],
          [21., 22.],
          [31., 32.],
          [11., 12.],
          [21., 22.]],

         [[31., 32.],
          [11., 12.],
          [21., 22.],
          [31., 32.],
          [11., 12.]],

         [[21., 22.],
          [31., 32.],
          [11., 12.],
          [21., 22.],
          [31., 32.]]]])

Using permute:

v.permute(0,2,1,3)
tensor([[[[11., 12.],
          [11., 12.],
          [11., 12.],
          [11., 12.],
          [11., 12.]],

         [[21., 22.],
          [21., 22.],
          [21., 22.],
          [21., 22.],
          [21., 22.]],

         [[31., 32.],
          [31., 32.],
          [31., 32.],
          [31., 32.],
          [31., 32.]]]])

Nonetheless, When you train using view or permute you will obtain similar results but using view will produce slightly better results as it will have features from all relative time steps unlike permute which in the previous example didn't have features from the 3rd time step for the first pedestrian. The main goal of TXPCNN is to produce the time dimension as a feature channel.

from social-stgcnn.

hunterbobo commented on July 17, 2024

Thank you. It is a interesting phenomenon that view () make it better.
And I have another question. Maybe I have wrong understanding of the code. But I have read the code of yours and the code of sgan for many times. I feel it seems there is a little difference between you computation of ADE and sgan when choosing the best trajectory to calculate ADE. For example, as shown in the following picture, there are two trajectories in the scene, the red one and the green one. If we predict 20 times of them, we get 20 predictions of the red one and 20 predictions of the green one. Your code will choose the best prediciton of the red and the best prediction of green to compute the ADE. But I think sgan is different, they select the best in the scene level not the trajectory level. sgan will compute the error sum of the two trajectories in the scene, and then select the best scene of the 20 predictions to compute the ADE. Could you please tell me whether I am right or wrong?

from social-stgcnn.

abduallahmohamed commented on July 17, 2024

A property to note regards TXP-CNN layer that it is not a permutation invariant as changes in the graph embedding right before TXP-CNN leads to different results. Other than this, if the order of pedestrians is permutated starting from the input to Social-STGCNN then the predictions are invariant.

You are right regards how the ADE is done on my side. For S-GAN I'm not sure about their implementation for ADE.

What I notice from SGAN https://github.com/agrimgupta92/sgan/blob/master/scripts/evaluate_model.py#L81 that they use raw FDE and then they take the minimum using https://github.com/agrimgupta92/sgan/blob/master/scripts/evaluate_model.py#L53. TBH, it doesn't make sense in my case to take minimum over the scene as we try to sample from a distribution. For each pedestrain and for each time step we have a distribution, I treat the distribution related to each pedestrian as whole and sample, take the minimum. In SGAN, they don't produce a distribution they produce an expectation, so their case is slight different from ours.

Let me know if something doesn't sound right.

from social-stgcnn.

hunterbobo commented on July 17, 2024

In order to evaluate fairly, I think you should use the same evaluation method. I feel that you could treat the distribution related to every pedestrian in the same scene as whole and sample which means sampling the trajectories in the scene level and selecting the best scene, then you could get a comparable ADE with sgan. Or you could calculate the ADE of sgan by selecting the best trajectory. Because no matter how to predict the trajectories, selecting the best trajectory will always get a better ADE than selecting the best scene.
And could you please tell me how the other methods listed in Table 2 calculate ade? Many of them don't release their code, so I am not sure how they calculate the ADE.

from social-stgcnn.

abduallahmohamed commented on July 17, 2024

I’m not sure how they calculated their ADE also, I grabbed numbers from their papers.

Are you sure that SGAN do a scene level? If, so, we are closer to Social LSTM as regress against bi-variate distribution like ours. In a sense SGAN geneates a whole sample of all trajectories, so they are correlated unlike ours in which we generate a distribution parameters that we need to sample afterwards.

from social-stgcnn.

golnazhabibi3 commented on July 17, 2024

Hi, I am reading the comments and it is still not clear for me how ADE is calculated. Could you confirm if this is correct: for each time step, you same 20 points from the distribution, select the one with minimum error, then continue the estimation from that point (treat it as new starting point, predict again, sample from distribution, select the best point and so on) Thanks!

from social-stgcnn.

abduallahmohamed commented on July 17, 2024

Hi @golnazhabibi3 if you followed the code https://github.com/abduallahmohamed/Social-STGCNN/blob/master/test.py#L95
Per each user, I sample 20 trajectories and select the minimum for this user. I sample the whole trajectory as is per user, there's not continuation per point as our model generates the distribution in a single inference step.

I hope this is clear

from social-stgcnn.

golnazhabibi3 commented on July 17, 2024

Thanks for your explanation and clarification!

from social-stgcnn.

Questions about the model about social-stgcnn HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent