Thank you for releasing the excellent codebase. Can you please also point this out in the repository or post the code to generate visualizations for the last layer to show the focus regions of the model?
Hello. Thank you for sharing this great work. I was trying to reproduce the evaluation results on K400 test set using Video-FocalNet-T, where it is mentioned in the readme that the top1-acc is 79.8% while I'm getting around 65%.
Here is the code we have tried to evaluate v-focalnet on K400-test set (19796 samples). Also, please note, in this code, no crop is used though, but I have also tried crop=3 but unfortunately no luck. However, the gap between the reported score and computed score is too high.
Hello! Thank you for your wonderful work! I was trying to use your model for my action recognition task in google colab, but have faced problems with dependencies.
I couldn't use conda and changed all "conda" command to "pip" e.g. "conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch" -> "pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu113". When I ran script I saw problem with importing mmcv. I couldn't find which version of mmcv I needed.
Can you help me with this problems? Thank you!