Giter Site home page Giter Site logo

Comments (5)

simmonssong avatar simmonssong commented on August 16, 2024 1

Thank you.
Question 2. I got definition in Social-LSTM[1], a joint of several independent 2-dimension Gaussian distributions.
Question 1. In ST-GCN[2] model, kernel_size is for spatial convolution on graph, where adjacency matrix is time-invariant. If my understanding is not wrong, it is a learnable kernel just like in regular CNNs. But in your paper, adjacency matrix is time-variant and non-learnable. So I think torch.einsum('nctv,tvw->nctw', (x, A)) is better. And parameter kernel_size can be removed.

The complete code is as follows. I'm testing whether this change will influence the result.

class ConvTemporalGraphical(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 t_kernel_size=1,
                 t_stride=1,
                 t_padding=0,
                 t_dilation=1,
                 bias=True):
        super(ConvTemporalGraphical,self).__init__()
        self.kernel_size = kernel_size
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=(t_kernel_size, 1),
            padding=(t_padding, 0),
            stride=(t_stride, 1),
            dilation=(t_dilation, 1),
            bias=bias)

    def forward(self, x, A):
        assert A.size(0) == self.kernel_size
        x = self.conv(x)
        x = torch.einsum('nctv,tvw->nctw', (x, A))
        return x.contiguous(), A

[1] Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., & Savarese, S. (2016). Social LSTM: Human trajectory prediction in crowded spaces. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 961–971.
[2] Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 7444–7452.

from social-stgcnn.

abduallahmohamed avatar abduallahmohamed commented on August 16, 2024

Hi,

  1. We are not using regular TCN in our work, we just refer to the concept and correlate it with TXPCNN layer that treats the temporal dimension as a feature channel; unlike TCN that treats temporal data as pixel values.
    The einsum is Einstein sum which is a concept you can google for it; What we are trying to do here is to collapse the graph sequences into a single representation of <time,ped,features> using the graphs and their corresponding adjacency matrix (A); in other terms we weight the features from neighbor pedestrians to a specific pedestrian using A.

2, Social-STGCNN is not a deterministic model, if you refer to the loss function in the paper we model the trajectory as a bi-variate gaussian distribution and predict the 5 parameters of each trajectory in time which are mean_x, mean_y,variance_x,variance_y and correlation_xy.
By predicting the distribution, you can sample multiple trajectories, in our testing we sample 20 trajectories as this was a community standard for these kind of problems.

Thanks

from social-stgcnn.

abduallahmohamed avatar abduallahmohamed commented on August 16, 2024

Hi,
Thanks for your notice on this; I re-ran the experiments again and obtained similar results as per your suggestions and it makes sense. I also updated the repo accordingly.

from social-stgcnn.

d-zh avatar d-zh commented on August 16, 2024

Hi,
I wonder which commit is the vesion of your paper published in CVPR 2020.
Thanks!

from social-stgcnn.

abduallahmohamed avatar abduallahmohamed commented on August 16, 2024

@d-zh https://github.com/abduallahmohamed/Social-STGCNN/tree/ebd57aaf34d84763825d05cf9d4eff738d8c96bb

from social-stgcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.