Giter Site home page Giter Site logo

Comments (8)

okankop avatar okankop commented on July 19, 2024 1

Initial dim of the tensor is always reserved for the batch size. So if you want to load only one clip, you need to expand initial dim to 1. Moreover, you are concatenating the frames in the wrong dim. Your final tensor shape should be [1,3,16,h,w] such that line 'x_2d = input[:, :, -1, :, :]' in "model.py" can successfully takes the last frame of the clip.

For a successful inference you need also processing of clip (such as normalization etc) same as the test phase.

from yowo.

wei-tim avatar wei-tim commented on July 19, 2024

@jinfagang
Thanks for your interest. Our 3D-CNN model extracts spatial-temporal information from an input clip consisting of several successive frames, thus you need to concatenate them (8/16 frames) together as a clip.

from yowo.

lucasjinreal avatar lucasjinreal commented on July 19, 2024

How to specific using 2d or 3d? it seems default use them all. 8/16 means 8~16 frames?

from yowo.

wei-tim avatar wei-tim commented on July 19, 2024

@jinfagang
3D model helps to understand an action, while 2D model boosts the localization precision. Our algorithm fuses both 3D and 2D information to achieve the spatial-temporal localization task. If only a single model is employed, the result will be worse. You can find the corresponding ablation study in our paper.

We provide two options: 8 frames or 16 frames. Model with 8 frames performs a little bit worse than 16 frames yet more efficient. The experiment results are also presented in the paper.

from yowo.

lucasjinreal avatar lucasjinreal commented on July 19, 2024

thanks, I got it. That means the input video at least 16 frames for inference?

from yowo.

wei-tim avatar wei-tim commented on July 19, 2024

@jinfagang
For the model with 16 frames, yes.

from yowo.

kinivi avatar kinivi commented on July 19, 2024

@wei-tim can I manually edit clip size to 32?

from yowo.

GxZhu avatar GxZhu commented on July 19, 2024

@jinfagang Running into the same error as you. I read in an image frame as an np.array and made the shape of the image [3,h,w]. Then I concatenated 16 consecutive frames into an array with shape [16, 3, h, w] before converting to Tensor.

I am still missing a dimension (shape length is current 4 and not 5). Did you find a fix?

from yowo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.