Giter Site home page Giter Site logo

orvit's People

Contributors

eladb3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

orvit's Issues

How to do inference on new video?

If I do inference, it works on the validation set and it generates the prediction accuracy. This is working fine, since we have the annotations for the validation set. But, for the test set or new input video, the detection has to be done before classification. Could you please help me with how to do inference for the new video where the detected bbox is unavailable?

Thanks in advance

Object Region Attention block

The code uses a different method than that mentioned in the paper after ROI ALIGN
The paper has the MLP layer after Max Pooling but the code has the MLP layer first and then does Max Pooling

Problem about the bad baseline result

Hello, we run the baseline model on Something-else dataset under the ORViT framework. But the best acc we get is only 49.6, which is far from the result (60.2) reported in the paper. There are no changes to the model's config file (Smthelse_ORViT-MF_224_16x4.yaml), except for the batch_size being adjusted to 16 (4 GPUs).

Codes for Diving48

Hi. Is it possible for you to publish the corresponding codes (e.g. config, etc) related to Diving48 as well? Thanks!

Detection boxes for something-somethingv2 dataset

I have found the detected bounding box for the something somethingv2 dataset here (https://drive.google.com/drive/folders/1XqZC2jIHqrLPugPOVJxCH_YWa275PBrZ). But they are stored in the split of 4 json files namely
bounding_box_smthsmth_part1.json
bounding_box_smthsmth_part2.json
bounding_box_smthsmth_part3.json
bounding_box_smthsmth_part4.json
Could you please help me are these the files I am required to give in the place of detected boxes?. If it is a different file, could you please provide the link to download the files?

I guess the detection boxes have to be stored according to their filename. Because I got the error as
No such file or directory: '../database/something-something/something-something-v2/detected_boxes/74225'

Is there any script to split the detected boxes of each file separately?

Thanks in advance

Model weights on SthElse

Hi,
Thank you for the wonder work, do you have plan on releasing the trained model on SomethingElse?

Extracting bounding boxes for the remaining SSv2 videos

Hi, for SSv2, the annotations zip folder contains the bounding boxes for ~180k videos. In order to get them for the rest of the videos, could you please provide details about the setup that you used to get the detections? Using pretrained Faster-RCNN (detectron2) directly on the test set yields poor results. Please advise on the network architecture, how to fine-tune, hyperparameters, class labels used, etc. It would be helpful if you could provide the annotations for the remaining test videos as well.
Thanks!

train AVA datasets

Hello, thank you for your work. How can I use orvit to train AVA datasets? Have you replicated the training of mvit on the AVA dataset and found mvit.yaml in your configuration file

Model file for run_net.py

Hi,

Thanks for amazing work. Where can I find the model: k600_motionformer_224_16x4.pyth. On the TimeSFormer website, There doesn't seen to be a model named exactly same as mentioned in configs/ORViT/Smthelse_ORViT-MF_224_16x4.yaml. Nevermind, found it here: Motionformer. the link for Motionformer in README.md takes to TimesFormer which was causing confusion.

Thanks,
Nirat

Object Region Attention

Hello,
In the paper, it is mentioned that the in the ORVIT block the object region attention is carried out by different q, k and v values i.e; q is set to the patch tokens and k,v are set as the concatenated tokens from the patches and the object regions.

X = THWd , C = T(HW+O)d

So, in the object-region attention; it should be (acc to the paper) : Q = XWq; k = CWk; V = CWv

However, in the code, I realize that the concatenated tokens are being passed to the trajectory attention module.

all_tokens, thw = self.attn(

Also, in the trajectory attention module,

class TrajectoryAttention(nn.Module):
, the q, k and v values are set as identical to the ones from the concatenated tokens.

Can you please help me explain this ? I cant seem to find where the original patch tokens are set to the q for the trajectory attention mechanism.

Thanks :)

Regarding the object selected for input

Hi, Nice to meet you! Great work!
I wonder if there is a reason for selecting 4 objects per frame in EpicKitchen, where there can be clearly more than 10 objects in one frame. In this case, how did you select which 4 object information to incorporate into the model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.