anirudh257 / strm Goto Github PK

[CVPR 2022] Official Pytorch Implementation for "Spatio-temporal Relation Modeling for Few-shot Action Recognition". SOTA Results for Few-shot Action Recognition

Home Page: https://anirudh257.github.io/strm/

Python 96.16% Shell 3.84%

few-shot few-shot-learning few-shot-recognition cvpr2022 cvpr action-recognition spatio-temporal

strm's Introduction

Spatio-temporal Relation Modeling for Few-shot Action Recognition (CVPR 2022)

[Paper][Project Page]

Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem

Installation

The codebase is built on PyTorch 1.9.0 and tested on Ubuntu 18.04 environment (Python3.8.8, CUDA11.0) and trained on 4 GPUs. Build a conda environment using the requirements given in environment.yaml.

Attention Visualization

Results

Method	Kinetics	SSv2	HMDB	UCF
CMN-J	78.9	-	-	-
TARN	78.5	-	-	-
ARN	82.4	-	60.6	83.1
OTAM	85.8	52.3	-	-
HF-AR	-	55.1	62.2	86.4
TRX	85.9	64.6	75.6	96.1
STRM [Ours]	86.7	68.1	77.3	96.8

Training and Evaluation

Step 1 : Data preparation

Prepare the datasets according to the splits provided.

Step 2 : Training

Use the scripts given in scripts

Step 3 : Evaluation

Use the evaluation script as given in eval_strm_ssv2.sh
Download the checkpoints from these links: SSV2, Kinetics, HMDB, UCF

Citation

If you find this repository useful, please consider giving a star ⭐ and citation 🎊:

@inproceedings{thatipelli2021spatio,
  title={Spatio-temporal Relation Modeling for Few-shot Action Recognition},
  author={Thatipelli, Anirudh and Narayan, Sanath and Khan, Salman and Anwer, Rao Muhammad and Khan, Fahad Shahbaz and Ghanem, Bernard},
  booktitle={CVPR},
  year={2022}
}

Acknowledgements

The codebase was built on top of trx. Many thanks to Toby Perrett for previous work.

Contact

Should you have any question, please contact 📧 [email protected] or message me on Linkedin.

strm's People

Contributors

Stargazers

Watchers

Forkers

eveleslie eee12315 felixpun jiazheng-xing shanye1516 artificial-intelligence-office xinzhe-ni erica-yang xiezhiwei26 jetsql carolchenyx kzwyj ahmedfakhry47 xiaoxiaogang1 112582603

strm's Issues

doubt about validation dataset

Hi! Thanks for your excellent work! I'm intresting in few-shot action recognition too. But it confused me a lot that how should I find the best iteration during meta-training phase with validation dataset ? Your work is based on TRX ,which I 've tried to reproduce ,and I find that both TRX and STRM don't use validation dataset explicitly in the code, instead give the final iteration directly.
In my own experiment, I found that the test acc on the val dataset is lower than the test acc on the test dataset, and if I set the frequency of the validation to 1000 meta-traing tasks I got val acc that fluctuates up and down. Have you met this question during your experiment?
the works in this area are so few that I can't find a reasonable explanation, so I would be very appreciated if you can solve my confusion...

When training ,loss is nan

Hello, I have been running your code recently, but there are some problems that cannot be solved. I hope you can help me.
When I was training, the loss value would be nan value. After debugging, I found that after a certain number of rounds of input to resnet, the output feature value would become nan. I don't know why this happened or how to solve it. Can you help me?

Doing inference with trained model

Hi, Thank you for sharing the project codes!

I have trained a model on my costume dataset.
How can I modify the script to do inference? For example given a sample input, I want the model to give me the most probable class among all the support classes.

question in extracting frames

hi, thank you for your excellent work.
I'm also interested in this research field, but I have several questions in preparing the dataset.

how do you extract the videos into frames, did you use ffmpeg?
if you used ffmpeg to extract the frames, can you share me with the detail of it? Such as how many fps(-r) do you extract each video and the quality (-q:v).

Thank you so much!

ValueError: Sample larger than population or is negative

Your work is great, I'm trying to reproduce it, but I'm having some problems, like this above, how to solve this problem?

multi-label prediction

Hi,
this is nice work.
the paper model, strm is single label prediction.
how to predict multi label data??

model.py

Hi,
Thanks for sharing the code. I try to run the model.py to look through the architecture of the model and test if the model can work.
Error occurs:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

Training environment issues

I would like to know whether the training environment must be the same as the configuration in the environment.yml you gave in order to get the results in the paper.

Some videos are missing in the Kinetics Dataset.

Thank you for shearing your code. It's a wonderful work!

I create the FSL-Kinetics from Kinetics-400 dataset following the kinetics_CMN/*.txt splits. I found that some videos are missing in Kinetics-400. Could you please provide the links of these videos?
BTW, I use the incomplete FSL-Kinetics retrain your model and get the following result. This result is slightly lower than that reported in the paper (86.7). It's this caused by these missing videos?

{'kinetics': {'accuracy': 85.38, 'confidence': 0.32, 'loss': 0.00029}}

Looking forward to your replay!

about dataset and training

Hello! Thank you for your excellent work. I now have a dataset of hmdb51, how to split it according to the splits you gave. And how do you modify the sh file for training afterwards? Can you elaborate, look forward to your reply, thank you very much!

ucf数据集处理

你好，请问数据集怎么处理的 ucf?

the Kinetics-100 dataset

Nice to meet you bro, I recently search for the dataset Kinetics, but always miss a lot videos, it takes me a lot time, Can you share the Kinetics-100 dataset?Thanks! My mail is [email protected]. Thank you, Please!

datasets and ValueError: Sample larger than population or is negative

Hi there I have organized my hmdb datasets into jpgs like this：

./video_datasets/data/hmdbjpg/brush_hair/April_09_brush_hair_u_nm_np1_ba_goo_0/April_09_brush_hair_u_nm_np1_ba_goo_0_1.jpg

./video_datasets/data/hmdbjpg/brush_hair/April_09_brush_hair_u_nm_np1_ba_goo_0/April_09_brush_hair_u_nm_np1_ba_goo_0_2.jpg
...
and my training command is
python3 run.py -c checkpoint_dir_hmdb/ --query_per_class 4 --shot 5 --way 5 --trans_linear_out_dim 1152 --test_iters 10000 --dataset hmdb --split 3 -lr 0.0001 --method resnet50 --img_size 224 --scratch new --num_gpus 1 --print_freq 1 --save_freq 10000 --training_iterations 20010 --temp_set 2
the error is :
File "/home/tcc/anaconda3/envs/torch1.9/lib/python3.8/random.py", line 363, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

have you met this issue before, how can i solve it?

About train_output.log

Hi Anirudh!
I'm sorry I forgot to reply to you in email last time, but after reading your reply I went and compiled the ssv2 dataset and it's ready.
If you need it, I can provide you with the download address of the file, and if anyone asks about this dataset in the future, you can provide it directly.
I would also like to bother you again, when I ran the code, the program prompted me that a file could not be found.
train_logger = setup_logger('training_accuracy', '. /runs_trx_ssv2_tempset_2_gamma_hier_patch_16_lin_bot_mlp_new_dist_loss_clsw_relu_no_pe_drop_frame_mlp_mix_run1/train_output.log')
eval_logger = setup_logger('Evaluation_accuracy', './runs_trx_ssv2_tempset_2_gamma_hier_patch_16_lin_bot_mlp_new_dist_loss_clsw_relu_no_pe_drop_frame_mlp_mix_run1/eval_output.log')
They are located in the run.py line32 line 35
I would like to ask what are these two files? Are you comfortable providing these two files?
Thank you so much!

the number of frames chosen

Hello, Can you tell me how you extracted the jpg images from the avi format video and how many frames are chose for a video? The git project from trx gets an empty folder.

Accuracy issues

Hello, I currently tested only one dataset UCF101 by loading your model but did not get an accuracy of 96.9, only 96.308. I have a feeling it may be a dataset issue or a luck issue, so I would like to get the code you used in building the dataset.

Inconsistent dataset catalog structure from official UCF101

The UCF101 dataset you are using is in .zip format, while the dataset I downloaded from the official website is in .rar format, and the dataset directory structure on the official website is different from the one you are using, can you send your dataset to google drive?

Test question

Hi，It's nice to see the work you do,but I'm having a little difficulty.
I use your pre-trained model to test the hmdb51.What if I only want to test one class video? Set way to 1, but then loss is 0 and the accuracy is 1.

dataset question

Hello,
There are some question about the code.

I use the command, but terminal return the error.
So, the file "hmdb51_256q5.zip", I cannot find it.

Thanks,

Accuracy recurrence issues

Dear author,
I processed the video images via ffmpeg and used your txt file of the training set test set as well as the validation set. In this case, loading the final model you uploaded for testing against the test set never gives me the accuracy in the paper, what is the problem?

                                                                                                                                                                                         Best regards.

Attention Visualization

Hi,

Thank you for your nice work!
I'd like to know how to visualize attention like Figure 1 in your paper.
Is there any code for visualization?

Thanks,

A question about two log files called train_output.log and eval_output.log.

Hello, in run.py there are two lines

train_logger = setup_logger('Training_accuracy', './runs_strm/train_output.log')

eval_logger = setup_logger('Evaluation_accuracy', './runs_strm/eval_output.log')

I wonder where is the train_output.log and eval_output.log, I mean can get it after trainng the model or get it from previous work (trx)?
Hoping your early reply. Thank you!

accuracy

Dear Author
Does the paper include the value of confidence for the accuracy of the four datasets?
In addition to this, what is the seed for each of the four datasets?

Missing key(s) in state_dict

Hello,

I encountered an issue while using the provided checkpoints from the linked Google Drive. Several keys related to the Transformer are missing from the state sictionary:

Missing key(s) in state_dict: "transformers.1.pe.pe", "transformers.1.k_linear.weight", "transformers.1.k_linear.bias", "transformers.1.v_linear.weight", "transformers.1.v_linear.bias", "transformers.1.norm_k.weight", "transformers.1.norm_k.bias", "transformers.1.norm_v.weight", "transformers.1.norm_v.bias"

I tried both checkpoints for Kinetics and SSV2.

Could you please provide updated checkpoints?

ssv2

Hi, when working with ssv2 dataset, how can I handle the fact that the same array in the txt file of the three sets of train test and val, for example train10 and test10, represent different categories?

Q about few-shot action recognition

Hello, I have a principle question. Although in this paper we use N-way K-shot for training, we don't have actions from the training set in our test set. Can this still be called a FEW SHOT problem? Isn't this a ZERO-shot problem?

Is the code complete?

Is the code complete? In model.py

About the number of the params

Hello! I'm interested in the number of your network params, would you like to tell me about it?

What to do checkpoint_corr model file loading error

Some confusion

We used ffmpeg to extract the video in hmdb51 at 8 fps, the number of iterations was set to 40000, and the rest of the parameters were the default parameters in your code, and I've read the previous issue and preprocessed the dataset as mentioned,but did not get the expected accuracy, the difference was about 4%, what may be the reason?
Thank you very much！

Questions about validation sets

How do you handle validation sets? Are there parts of the code that are useful for validation sets?

parameter scratch

Hello, I didn't quite understand the meaning of the scratch parameter, why three different args.scratch were created, and if bp is meant to modify two parameters, then what is the difference between new and bc. Also, this will allow multiple copies of the data to be copied to different folders

There is a gap with the report in hmdb

Thank you for sharing your code. It's a wonderful work!
When I run test code in hmdb using checkpoint you given, I get the accuracy that is 74%, smaller than the report(77.3).
I wonder if you could tell me the possible reasons?

this is the script code I used.
python3 run.py -c checkpoint_dir_hmdb/ --query_per_class 4 --shot 5 --way 5 --trans_linear_out_dim 1152 --test_iters 10000 --dataset hmdb --split 3 -lr 0.0001 --img_size 224 --scratch new --num_gpus 4 --method resnet50 --save_freq 10000 --print_freq 1 --training_iterations 20010 --temp_set 2

about sh

Hello, I am running .sh file error ./run_strm_hmdb.sh: line 19: srun: command not found. What is the reason for this?

ValueError: not enough values to unpack (expected 3, got 0)

envs/env_ani/lib/python3.9/site-packages/torch/_utils.py", line 722, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.

video_reader.py", line 343, in getitem
target_set, target_labels, real_target_labels = zip(*t)
ValueError: not enough values to unpack (expected 3, got 0)

I tried to evaluate on HMDB51 with comand: nohup python3 run.py -c checkpoint_dir_hmdb_425/ --query_per_class 4 --shot 5 --way 5 --trans_linear_out_dim 1152 --test_iters 10000 --dataset hmdb --split 3 -lr 0.0001 --img_size 224 --scratch new --num_gpus 2 --method resnet50 --save_freq 10000 --print_freq 1 --training_iterations 20010 --temp_set 2 --test_model_only True --test_model_path checkpoints/checkpoint_hmdb.pt > 425.log 2>&1 &
I check the log file and when it finish Task [341/20010], Train Loss: 0.2502708, Train Accuracy: 0.9444445,the error mentioned above occurred.How can I solve it , does it have something to do with the HMDB51 dataset I got is different from yours?
Or is the default dataloader parameter num_workers=10 too big?

Preprocessing about UCF 101

Hi, I have a question about read_dir function in video_reader.py.

I downloaded ucf_101 and renamed it UCF-101_320.zip. There are 101 folders of classes. But I don't understand the meaning about img_list = [x for x in self.zfile.namelist() if '.jpg' in x]. As I understood, this line means "making the list with all the 'jpg's in the zip file."

Is there any jpg files in the UCF-101_320.zip?

Or would you share the zip file to my email address [email protected] ?

Thank you.

Why did the target image change to 40 x 3 x 224 x 224 during training?

Five categories of samples are missing in the Kinetics-100 dataset;

Hi, Anirudh, I am here again.
Compared to the splits of kinetics, I found my kinetics dataset miss 5 categories, they are "cleaning_floor" ,"dying_hair", "opening_bottle", "using_computer" in the train set, and "hurling_(support)" int the test set.
Also, there are five more categories, they are "falling_off_bike","opening_door","playing_darts","tapping_guitar" and "walking_the_door".
Can you send me the missed 5 categories, Please！ Or should I replace the splits with these categories?

Questions about FLOPs

Hello, there is a reference to this sentence Moreover, it is worth mentioning that our STRM is comparable to TRX in terms of compute, requiring only ∼ 4% additional FLOPs in your paper. Can I ask you what is the exact value of FLOPs for your model?

Question about 1-shot

Hi,
Thanks for sharing the code. I tried to get experimental results in 1-shot for both 5-way and 20-way settings. The accuracy is always low for both training and test set and does not improve through the iterations after a couple of iterations. I think I am using the wrong parameters. Could you please share your parameters for 1-shot results? Have you get any results for 20-way settings? Moreover if you have any suggestions/tips on getting results in 1-shot for both 5-way and 20-way, it would be highly appreciated.
In the paper under the experiments of different number of support samples, there is a sentence: "additional results are provided in the supplementary", but I couldn't find these results in the supplementary. I think I am missing something so I would be very happy if you could direct me to those additional results.
Thanks again for the great work.
Best,

anirudh257 / strm Goto Github PK

strm's Introduction

Spatio-temporal Relation Modeling for Few-shot Action Recognition (CVPR 2022)

Anirudh Thatipelli, Sanath Narayan, Salman Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem

Installation

Attention Visualization

Results

Training and Evaluation

Step 1 : Data preparation

Step 2 : Training

Step 3 : Evaluation

Citation

Acknowledgements

Contact

strm's People

Contributors

Stargazers

Watchers

Forkers

strm's Issues

Recommend Projects

Recommend Topics

Recommend Org