rohitgirdhar / cater Goto Github PK

View Code? Open in Web Editor NEW

103.0 6.0 19.0 107.27 MB

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

Home Page: https://rohitgirdhar.github.io/CATER/

License: Apache License 2.0

Python 78.40% Shell 2.59% Objective-C 0.03% CMake 0.33% C++ 18.00% Cuda 0.65%

video-recognition action-recognition deep-learning clevr video-understanding

cater's People

Contributors

Stargazers

Watchers

Forkers

hyzcn roeiherz cvlab-columbia phymhan edenfrenkel zuoym15 ankitshah009 frontseat-astronaut czseas changjiale3 onedas rajawat23 liuyvchi vinayak-vg wilson1yan manuelvh44 twobob ramtin-nouri

cater's Issues

corrupted videos in pre-generated data

There are several corrupted video files in the downloadable zip datasets, e.g. 'videos/CATER_new_004798.avi' in the
max2action task (see screenshot). We downloaded the archive multiple times and the same files are corrupted.
How should we handle this?

Code to see the issue:

wget https://cmu.box.com/shared/static/jgbch9enrcfvxtwkrqsdbitwvuwnopl0.zip
# extract one corrupted video
unzip videos.zip videos/CATER_new_004798.avi
# open in a media player:
# vlc videos/CATER_new_004798.avi
# mplayer videos/CATER_new_004798.avi

Error in gt? Label 'spl' does not refer to the snitch.

Hi, when I'm visualizing the results of the tracking method (e.g. turn on DEBUG_STORE_VIDEO), I found that the gt location of spl does not place the snitch, but the ball. Btw, I can reproduce the number 33.9 exactly.
You can see in the following figure and videos, the gt (in blue rectangle) refers to the yellow ball, but not the snitch. I check the gt number is the same as shown in the visualization.

cater_debug_tracking_videos.zip

Actions per frame

Hi Authors,
Thanks for your work.
I tried generating 3 videos to test the dataset.
while actions_order_dataset seems to return frame, label and classes,
The output file (train.txt) under folder action_order_uniq contains no information about it.

It contains input something like
/images/CLEVR_new_000002.avi 53,54,60,69,70,71,72,74,77,78,81,83,129,138,144,153,155,156,157,161,162,165,167,173,179,187,188,195,197,198,200,203,204,207,209,257,263,264,265,270,272,279,281,282,284,287,288,291,292,293,381,382,383,387,389,390,392,396,398,405,407,408,410,411,412,413,414,415,417,419,423,425,430,431,432,434,438,440,447,449,450,452,455,456,459,460,461,465,471,474,480,489,490,491,492,495,497,498,501,502,509,515,518,524,532,533,536,539,545,549,551,555,557,558,560,564,565,566,573,575,576,577,578,580,581,582,585,586,587

How can i get frame by frame actions and classes?

3d_coords to pixel_coords transformation

Hello Authors,
For our work we require the pixel coordinates of all the objects. We could read the world (3d_coords) of the objects from the json files. Is there a way to convert that directly to pixel_coords??

We tried to compute the homographic transformation matrix using corresponding initial 3d_coords and pixel_coords of objects available in the json files, which doesn't produce accurate transformation.

function "get_camera_coords" in utils.py seems to do the job but we donot have access to the parameters to run the function.

Thank You

Broken zip files of pre-generated data.

Hi, I appreciate your wonderful work. However, when I unzip the pre-generated data downloaded from the direct links, it raises that some files inside are broken. I confirm that the size of the zip file is right and a simple retry doesn't get it right. How could I address it?

instructions of json format data in the pre-generated scene dataset

Hi,

Is there any instruction for the json format data in the pre-generated scene set?

Thanks.

How to get the GT masks of objects?

Hi, I want to run some segmentation codes (e.g. Mask RCNN) so I need the gt masks of objects in the scene. CLEVRER has provided them but CATER does not. I want to know do you plan to release the masks of objects? If not, how can we get them?

Labels corresponding to 301 classes of Task 2

Hello Authors,
Thanks for such a great work.
I could not find the labels corresponding to 301 classes of Task 2.
Can you please point out that list?

Thank You

Virtual Machine Specification not available

Dear Authors @rohitgirdhar ,

I tried to download and use the VM Spec but the downloaded spec_v0.img file is corrupted.

May I ask whether the link to download the VM spec is updated?
Could you kindly share the working URL to download the VM spec for CATER?

I used the download link in this webpage: https://cmu.box.com/s/krg7ehliaidruxjk21nfxsa0gge2uf2o, as provided in CATER-master/generate/README.md. However, when I tried to open the downloaded 3.0 GB spec_v0.img file, it says “The disc image file is corrupted”. If I specify the directory of the downloaded spec_v0.img file in launch.py and run launch.py, it prompts “singularity: not found”, as if there weren’t any spec_v0.img file. I have tried downloading and opening the spec_v0.img file on MacOS, Windows and Ubuntu, but all led to the same issue. I have tried using “wget” command to download the spec_v0.img file, but I didn’t find the specific URL for downloading.

I'm interested in your work and would like to learn from it.
Thank you very much!

Error due to changes in video folder files in official Pytorch directory

Hello Rohit, I ran into yet another error which i could not solve after repeatedly trying everything i could in the last 4 days.

In file included from /home/u1698461/Downloads/CATER-master/pytorch_once_again/pytorch/caffe2/video/customized_video_io.cc:25:0:
/home/u1698461/Downloads/CATER-master/pytorch_once_again/pytorch/caffe2/video/customized_video_io.h:30:10: fatal error: caffe/proto/caffe.pb.h: No such file or directory
#include "caffe/proto/caffe.pb.h"
^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
caffe2/CMakeFiles/torch_cpu.dir/build.make:6166: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/video/customized_video_io.cc.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/video/customized_video_io.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /home/u1698461/Downloads/CATER-master/pytorch_once_again/pytorch/caffe2/video/customized_video_input_op.cc:25:0:
/home/u1698461/Downloads/CATER-master/pytorch_once_again/pytorch/caffe2/video/customized_video_input_op.h:43:10: fatal error: caffe2/utils/thread_pool.h: No such file or directory
#include "caffe2/utils/thread_pool.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
caffe2/CMakeFiles/torch_cpu.dir/build.make:6153: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/video/customized_video_input_op.cc.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/video/customized_video_input_op.cc.o] Error 1
CMakeFiles/Makefile2:9446: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2

Steps Followed:

Used python = 3.6
I have cloned the official Pytorch repository and replaced the caffe2/video with caffe2_customized_ops/video
Ran "conda install --yes opencv" - for Including video processing operators in while installing caffe2
Changed the USE_FFMPEG = ON and USE_OPENCV = ON in pytorch/CMakeLists.txt
Followed installation steps for linux from here - https://github.com/pytorch/pytorch

I think the reason for this error is that the provided caffe2_customized_ops/video is the modified version of older caffe2/video which has since been modified in the official Pytorch repository.

I tried reaching Xiaolong Wang through mail, but didn't get any reply from him.

It would be really helpful if you could spare out just a couple of hours over the weekend in the issue.
The work that i have been doing on the CATER dataset over the last 4 months would be really affected if i cannot run R3D.

Please let me know if you need to know anything.

Thank You

Missing singularity file

The data generation script doesn't have the singularity/spec_v0.img file

Error while testing

Hello Rohit,

While trying to run the following command for testing:
"python launch.py -c configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml"

i am getting the following error:

Running
PYTHONPATH=pwd/lib/:$PYTHONPATH PYTHONPATH=pwd/external_lib/average-precision/python:$PYTHONPATH python tools/test_net_video.py --config_file configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml CHECKPOINT.DIR outputs/configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml TEST.TEST_FULLY_CONV True 2>&1 | tee outputs/configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml/log_test_net_video.py.txt

/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/core/config.py:349: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yaml_config = AttrDict(yaml.load(fopen))
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Traceback (most recent call last):
File "tools/test_net_video.py", line 351, in
main()
File "tools/test_net_video.py", line 336, in main
cfg_from_file(args.config_file)
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/core/config.py", line 350, in cfg_from_file
merge_dicts(yaml_config, __C)
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/core/config.py", line 333, in merge_dicts
type(dict_b[key]), type(value), key)
ValueError: Type mismatch (<class 'bytes'> vs. <class 'str'>) for config key: DATADIR

Initially i was getting similar error while generating LMDB running : python process_data/cater/gen_lmdbs.py

Traceback (most recent call last):
File "create_video_lmdb.py", line 116, in
main()
File "create_video_lmdb.py", line 111, in main
create_an_lmdb_database(args.list_file, args.dataset_dir)
File "create_video_lmdb.py", line 70, in create_an_lmdb_database
video_tensor.string_data.append(video_data)
TypeError: '/home/u1698461/Downloads/CATER-master/max2action/videos/CATER_new_005360.avi' has type str, but expected one of: bytes

which i resolved by changing the following line in create_video_lmdb.py,
from:
def create_an_lmdb_database(list_file, output_file, use_local_file=True):
to:
def create_an_lmdb_database(list_file, output_file, use_local_file=False):

But i am not able to resolve the error at the top.
Can you please guide to the solution or perhaps explain what to look for to solve the error?

I have installed Caffe2 in python=3.6 environment using : "conda install pytorch-nightly-cpu -c pytorch"

Thank You..

Weights of Non-local Networks, Fine-tuned on CATER

Hi Rohit,

Awesome work. Thanks for contributing to the society, and for sharing the code with the community.
Can you share the model weights on non-local-network that you fined-tuned on CATER?

Thanks,
Nour

Understanding TSN setup

In table 3 on paper, apparently you used 1 or 3 frames for TSN experiments.
What does it mean? Why did you train using 3 frames and test using 250 frames?
The task 3 is really challenging and it doesn't make sense to solve it using only 3 frames. I must have been mistaken about the setup.
Does it mean you sampled 3 frames per segment? Then how many segments are used and how many total frames are seen on training time?

Also, what is the detailed setup for the TSN+LSTM? It appears that you used 10 clips for the LSTM on 3D models, but using TSN did you still use 10 "frames"? Or how did you set it up for the TSN?

Lastly, do you have any plan for releasing the TSN code?

Thanks a lot for your awesome research!!

"actions_order_uniq" and "actions_present"

Hi. I download the dataset.
But the .txt file in "actions_order_uniq" and "actions_present" seems same.

clarify install instructions

It's not clear how to install for generation purposes. The instructions say: "all CLEVR requirements" but that doesn't really work because the CLEVR repo also doesn't have clear instructions.

Would be better to list in detail how to install or provide an install script.

Render time

Hi, I'm using blender 2.83 because using 2.79b would raise no GLU.so error. I have opened CUDA. However, it seems the rendering process is extremely slow. (More than one hour for one video) Is it normal?

LSTM code details

In LSTM code, I notice that 'To run the LSTM code, first extract the features using the TSN trained models'
Actually, I can't understand how to use this code. Can you provide some details or some h5/pkl files mentiond below?

def read_data(data_dir):
    if osp.exists(args.data_dir + '_val_feats.h5'):
        print('This looks like TSN outputs, reading it so.')
        val_data = read_data_tsn(args.data_dir + '_val_feats.h5')
        train_data = read_data_tsn(args.data_dir + '_train_feats.h5')
    elif osp.exists(osp.join(
            args.data_dir, 'results_probs_test_fullLbl.pkl')):
        print('This looks like NL outputs, reading it so.')
        assert args.lbl_dir is not None, (
            'lbl_dir must be set for NL models, since the labels are not '
            'stored in the PKL file.')
        val_data = read_data_nl(
            osp.join(args.data_dir, 'results_probs_test_fullLbl.pkl'),
            osp.join(args.lbl_dir, 'val.txt'))
        train_data = read_data_nl(
            osp.join(args.data_dir, 'results_probs_train_fullLbl.pkl'),
            osp.join(args.lbl_dir, 'train.txt'))
    else:
        raise NotImplementedError('Dunno how to read data directory {}'.format(
            data_dir))
    return train_data, val_data

Thanks!

Spatial Relationships

Hi,

Thanks for your interesting work.

Could you please explain the spatial relationships data?:
F = json.load(open('max2action/scenes/CATER_new_004617.json'))
len(F['relationships']['behind']) = 56
The number of frames: 40

The original spatial relations should be: if j is in F['relationships']['behind'][i] then object j is behind of object i (see here). However, here the index i is not an object (there aren't 56 objects in the scene). Could you clarify it?.

Thanks,

Originally posted by @roeiherz in #3 (comment)

Porting CATER into NVIDIA NeMo

I'm working on making CATER one of the available dataset in NVIDIA NeMo. To do so, I need to be able to download all data programmatically through static download links. As I noticed, only Scenes and Videos directory have static download links. The Lists directory can only be downloaded manually from box.com. Can you also add static download links for the Lists directory? Thank you.

Which caffe2 installation procedure did you follow, the official caffe2 merged with pytorch or build from source?

Hello Rohit,
I am getting the following error while running-
"python launch.py -c configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml -t test"

Traceback (most recent call last):
File "tools/test_net_video.py", line 351, in
main()
File "tools/test_net_video.py", line 346, in main
store_vis=args.store_vis)
File "tools/test_net_video.py", line 294, in test_net
store_vis=store_vis)
File "tools/test_net_video.py", line 110, in test_net_one_section
test_model.build_model()
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/models/model_builder_video.py", line 119, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/models/model_builder_video.py", line 230, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/u1698461/anaconda3/envs/last/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 39, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/u1698461/anaconda3/envs/last/lib/python2.7/site-packages/caffe2/python/data_parallel_model.py", line 236, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/models/model_builder_video.py", line 207, in add_video_input
batch_size=batch_size,
File "/home/u1698461/Downloads/CATER-master/baselines/video-nonlocal-net/lib/models/model_builder_video.py", line 171, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/u1698461/anaconda3/envs/last/lib/python2.7/site-packages/caffe2/python/core.py", line 2205, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []

Can you please guide me to the procedure you followed to install caffe2??

I used the following command to install caffe2 with python=2.7:
conda install pytorch-nightly-cpu -c pytorch

Faster Download links in Mainland China!

@rohitgirdhar
We have uploaded the dataset to Baidu Cloud and now share it with people from Mainland China as follows.
Baidu Cloud link
code：d9nk

Porting of the codes to python3 possible?

Hello Rohit,

xiaolonw's caffe2 repository is missing some modules, so i cannot build caffe2 using that.
The official version of caffe2 (merged with Pytorch) doesn't support Python = 2.7, so building from the official repository is also not an option.

It seems the only way to use your code is if it can be ported to python=3.6.

So, is there any way your code can be ported to Python=3.6 ?

It will be really helpful if you have any other suggestions.

Thank You

yaml file for Task 1

Hello Rohit,

The yaml file in the repository is corresponding to Task 3.
Can you please share the yaml file for Task 1?

Thank You

Localization of classes in Task 1

Hello Authors,

Can you please give an idea if the R3D network would be useful for localizing the actions as well ?

As mentioned in the paper, the actions are restricted to occur within one of the 10 slots of 30 frames each.
So instead of feeding in the whole video (12.5 secs) if we split the videos in 10 parts(1.25 secs each), and feed to the R3D model, would the accuracy would be close to 98% (reported)?

Thank You

Names of the classes in gt files.

Hi, I'd like to get the names of the classes in gt files in task 1 and task2, which should be 14 classes and 301 classes in total. Can you help me?

bounding box annotations

Hi,

Is there any way to extract the bounding box annotations per frame?.
I succeeded to extract (cx,cy) per box but how can I calculate the width and height of the box?

Many thanks,

rohitgirdhar / cater Goto Github PK

cater's People

Contributors

Stargazers

Watchers

Forkers

cater's Issues

Recommend Projects

Recommend Topics

Recommend Org