Giter Site home page Giter Site logo

soczech / lookforthechange Goto Github PK

View Code? Open in Web Editor NEW
34.0 4.0 4.0 19 KB

Code for Look for the Change paper published at CVPR 2022

Home Page: https://data.ciirc.cvut.cz/public/projects/2022LookForTheChange/

License: MIT License

Dockerfile 1.88% Python 81.29% C++ 1.90% Cuda 14.93%
video computer-vision object-localization action-localization

lookforthechange's Introduction

Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

This repository contrains code for the CVPR'22 paper Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos.

Run the model on your video

  1. Prerequisites

    • Nvidia docker (i.e. only linux is supported, you can install environment from docker file manually or not use GPU and then you do not have to use docker / linux).
    • Tested with at least 4GB VRAM GPU.
  2. Download model weights

    • mkdir weights; cd weights
      wget https://data.ciirc.cvut.cz/public/projects/2022LookForTheChange/look-for-the-change.pth
      wget https://isis-data.science.uva.nl/mettes/imagenet-shuffle/mxnet/resnext101_bottomup_12988/resnext-101-1-0040.params
      wget https://isis-data.science.uva.nl/mettes/imagenet-shuffle/mxnet/resnext101_bottomup_12988/resnext-101-symbol.json
      mv resnext-101-symbol.json resnext-101-1-symbol.json
      
  3. Setup the environment

    • Our code can be run in a docker container. Build it by running the following command. Note that by default, we compile custom CUDA code for architectures 6.1, 7.0, 7.5, and 8.0. You may need to update the Dockerfile with your GPU architecture.
      docker build -t look-for-the-change .
      
    • Go into the docker image.
      docker run -it --rm --gpus 1 -v $(pwd):$(pwd) -w $(pwd) look-for-the-change bash
      
  4. Extract video features

    • Our model runs with preextracted features, run the following command for the extraction.
      python extract.py path/to/video.mp4
      
      The script creates path/to/video.pickle file with the extracted features.
    • Note you may need to edit memory_limit of tensorflow in feature_extraction/tsm_model.py if you have less than 6 GB of VRAM.
  5. Get predictions

    • Run the following command to get predictions for your video.
      python predict.py category path/to/video.pickle [--visualize --video path/to/video.mp4]
      
      where category is id of a dataset category such as bacon for Bacon Frying. See ChangeIt dataset categories for all options.
    • The script creates path/to/video.category.csv with raw model predictions for each second of the original video.
    • If a path to the original video is provided, the script also generates visualization of the predictions.

Replicate our experiments

  1. Prerequisites

    • Set up the docker environment and download the ResNeXT model weights as in points 0., 1., and 2. of the previous chapter.
    • Note that for training the GPU is required due to the custom CUDA op.
  2. Dataset preparation

    • Download ChangeIt dataset videos. Note it is not necessary to download the videos in the best resolution available as only 224-by-224 px resolution is needed for feature extraction.
    • Extract features from the videos.
      python extract.py path/to/video1.mp4 path/to/video2.mp4 ... --n_augmentations 10 --export_dir path/to/dataset_root/category_name
      
      This script will create path/to/dataset_root/category_name/video1.pickle and path/to/dataset_root/category_name/video2.pickle files with extracted features. It is important to have some dataset_root folder containing category_name sub-folders with individual video feature files.
  3. Train a model

    • Run the following command to train on the preextracted features. Note that for every category a separate training needs to be run. Also keep in mind that due to the unsupervised nature of the algorithm, you may end up in bad local minima. We recommend to run the training multiple times to get the best results.
      python train.py --pickle_roots path/to/dataset_root
                      --category category_name
                      --annotation_root path/to/annotation_root
                      --noise_adapt_weight_root path/to/video_csv_files
                      --noise_adapt_weight_threshold_file path/to/categories.csv
      
    • --annotation_root is the location of annotations folder of ChangeIt dataset, --noise_adapt_weight_root is the location of videos folder of the dataset, and --noise_adapt_weight_threshold_file points to categories.csv file of the dataset.

References

Tomáš Souček, Jean-Baptiste Alayrac, Antoine Miech, Ivan Laptev, and Josef Sivic. Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

@inproceedings{soucek2022lookforthechange,
    title={Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos},
    author={Sou\v{c}ek, Tom\'{a}\v{s} and Alayrac, Jean-Baptiste and Miech, Antoine and Laptev, Ivan and Sivic, Josef},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2022}
}

Acknowledgements

The project was supported by the European Regional Development Fund under the project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000468) and by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90140), the French government under management of Agence Nationale de la Recherche as part of the "Investissements d'avenir" program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute), and Louis Vuitton ENS Chair on Artificial Intelligence. We would like to also thank Kateřina Součková and Lukáš Kořínek for their help with the dataset.

lookforthechange's People

Contributors

jiuntian avatar soczech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lookforthechange's Issues

Feature extraction augment times

I notice that extract the feature requires 10 augmentation, which means each video it will store 11 augmented feature, and the feature extraction process is too long. Does the augmentation matter much? I want to just run 0 augmentation to accelerate the feature extraction speed, is this a bad idea?

Customer CUDA operation compilation

I try to compile the CUDA op with CUDA10.2 + pytorch1.11, however it reports error. Just notice in the README that the CUDA op can support at the highest CUDA8? And how to compile it with a higher CUDA version?

After executing command “docker build -t look-for-the-change .”, an error is reported

Hello, when I execute the command "docker build -t look for the change." An error is reported when.The error is as follows:

E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/universe/f/ffmpeg/libavcodec57_3.4.11-0ubuntu0.1_amd64.deb Connection failed [IP: 91.189.91.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/a/alsa-lib/libasound2-data_1.1.3-5ubuntu0.6_all.deb Connection failed [IP: 185.125.190.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/libb/libbs2b/libbs2b0_3.1.0+dfsg-2.2_amd64.deb Connection failed [IP: 91.189.91.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/f/flite/libflite1_2.1-release-1_amd64.deb Connection failed [IP: 185.125.190.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/f/fftw3/libfftw3-double3_3.3.7-1_amd64.deb Connection failed [IP: 185.125.190.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/n/norm/libnorm1_1.5r6+dfsg1-6_amd64.deb Connection failed [IP: 91.189.91.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libc/libcdio/libcdio17_1.0.0-2ubuntu2_amd64.deb Connection failed [IP: 91.189.91.39 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/libc/libcdio-paranoia/libcdio-cdda2_10.2+0.94+2-2build1_amd64.deb Connection failed [IP: 185.125.190.39 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/universe/f/ffmpeg/libavdevice57_3.4.11-0ubuntu0.1_amd64.deb Connection failed [IP: 185.125.190.36 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/universe/f/ffmpeg/ffmpeg_3.4.11-0ubuntu0.1_amd64.deb Connection failed [IP: 91.189.91.39 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

Hope to answer as soon as possible, thank you!

Action (state) precision calculation

Thank you for your work and sharing the code! I have one question about the evaluation.

From method/utils.py L39-L61, it seems that the calculation of action precision and state precision are based on one frame only -- by taking the argmax of pred_action, and check whether the annotation for that frame is action.

I expect the action precision would be calculated by first finding all frames wherepred_action > 0.5, and checking how many frames has annotation = 3 (action).

May I know the motivation for just choosing 1 frame as the evaluation? For some categories with only a few (<10) test videos available, using one frame per video seems to bring many flutuations in the numbers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.