columbiadvmm / cdc Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 18.0 15 KB

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

cdc's People

Contributors

Stargazers

Watchers

Forkers

zhangxgu baileyqbb fireae wikipedia2008 ztwe jillian2017 mengrang wuxiaomin0110 feitiandemiaomi ymqian1785 ewenwan koala7580 xiaochehe mxguo guoruiwang dreamer121121 xeransis

cdc's Issues

Makefile:337: recipe for target 'build/src/caffe/layers/power_layer.cuo' failed


/usr/local/cuda-8.0/bin/nvcc -ccbin=/usr/bin/g++ -Xcompiler -fPIC -DDEBUG -g -O0 -I/usr/local/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/lo                                   cal/include -I/usr/local/cuda-8.0/targets/x86_64-linux/include -I/usr/include/hdf5/serial -Ibuild/src -I./src -I./include -I/usr/local/cuda-8.0/include -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_50,code=compute_50 -c src/caffe/layers/power_layer.cu -o build/src/caffe/layers/power_layer.cuo
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
**g++: error trying to exec 'as': execvp: No such file or directory
Makefile:337: recipe for target 'build/src/caffe/layers/power_layer.cuo' failed
make: *** [build/src/caffe/layers/power_layer.cuo] Error 1**

The error appeared when "make all". How to solve it? Where is .cuo? I only find pow_layer.o
Thanks in advance

Do I need to download and compile both caffe and C3D before compling this CDC project?

About inference speed

It is mentioned that your speed can reached at around 500 fps. I wonder how you compute the speed. Does it include the time of frames reading and pre-processing before input to the CDC becuse it is usually time-consuming? Hope to receive your reply soon.

The problem of assign frame-level label during training

Hello, Zheng

    Thanks for your sharing!
    I am very interested in S-CNN and  CDC published CVPR.

    When run  "gen_test_bin_and List" , for multiclass problems, each class labels are 0.
    So, i think this script does not satisfy the requirement of producing training set.
    
    Do you think I am right?   Looking forward to your reply.

Best regards

extract_image_feature generates "nan" probability

Hi,

Thanks for sharing the codes.
I followed the steps of reproducing results on THUMOS14. However, I found that some probabilities in .prob file generated by executing ./xfeat.sh were "nan". (not all probabilities in a single .prob file is "nan", only a few of them)
Any idea what may be wrong?

Thanks!

How did 有

First, thank you very much for sharing your work.
I still have a question about the training data.
How did you process the multi-label frames in the training data?
e.g. , for CliffDiving, almost all the frames also belong to Diving,
when assign one-hot labels for these frames, you assign them
[0,0,0,0,0,1,0,0,1,0,......] or just make two copies of the frames and assign them
[0,0,0,0,0,1,0,0,0,......] and [0,0,0,0,0,0,0,0,1,0,......] respectively.

Loss is -nan

I am trying to fine tune a network. My dataset contains folders, each correspond to one single classes:

|--example1_class1/frames_1.jpg...frames_n.jpg
|--example2_class1/frames_1.jpg...frames_m.jpg
|--example1_class2/frames_1.jpg...frames_i.jpg
|--example2_class2/frames_1.jpg...frames_j.jpg
.
.
.

Each folder has more frames than the window size, which is 32.
I then run gen_test_bin_and_list.py, with the same default configuration. The code to compute v_label has also been modified.
Next, I run finetuning.sh, I have only 6 classes, so the prototxt file is changed. Please see log.train-val
Strange thing is the loss is -nan. I do not have any clue how to debug this. I also have attached the train file at fb_train.prototxt.

Which can I do to debug this error?
Thank you!

there is no file 'probcrf.mat' when I am trying to run step3_gen_CDC_det.m

In line 5 of step3_gen_CDC_det.m, there is no file 'probcrf.mat' after I have run all the codes as you write in Reproduce results on THUMOS 2014 dataset
Thanks!

frame-level labeling and feature extraction output

Hi, when I run ./xfeat.sh I meet the following problems relating to feat. As you have said, output results will be stored in feat but

I ran demo successfully but there's nothing new produced in demo/feat.
I ran feature extraction in THUMOS/test successfully but there's no output relating to feat.

Thanks!

How to make Makefile.config

Hi,
I want to know how to make Makefile.config? Is there any modify of the Makefile.config in CDC?

Some Confusion on CDC Fine-tuning Steps

Hi Mr. Shou,

I have some following questions during the process of training:

Are these the right steps in the following? Because I try to follow the below method to fine-tuning my own model but cannot get the right results.
How do you deal with some short training videos which don’t have 32 frames? In gen_test_bin_and _list.py, if a video’s frames are fewer than 32, it will be error. I am not sure if it will produce some error if I just ignore these short videos.

Thanks in advance.

==================================================================

Step1 Prepare pre-trained model

cd THUMOS14/training/init/
sh run_net_surgey_sports1m_convdeconv.sh

Step2 prepare your own training data

exact frames from UCF101 (25 fps)
generate the bin files and the list file for the test set with gen_test_bin_and_list.py as following:

/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000001.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000033.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000065.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000097.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000129.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c01/000134.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c02/000001.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c02/000033.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c02/000065.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c02/000093.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c03/000001.bin
/home/qiqi/cdc/THUMOS14/training/window/v_ApplyEyeMakeup_g01_c03/000033.bin
.....

Step3 train

sh finetuning.sh and get the convdeconv-TH14_iter_24390 in the folder /snapshot

An error while running the demo code

Hi,
I run the demo file xfeat.sh but got an error.
Is there any suggestion to prevent the error?
Thanks a lot.

I0718 16:52:56.477802 1052 net.cpp:322] Copying source layer relu6
I0718 16:52:56.477825 1052 net.cpp:322] Copying source layer drop6
I0718 16:52:56.477830 1052 net.cpp:322] Copying source layer fc7-1-convdeconv
I0718 16:52:57.688776 1052 net.cpp:322] Copying source layer relu7
I0718 16:52:57.688798 1052 net.cpp:322] Copying source layer drop7
I0718 16:52:57.688802 1052 net.cpp:322] Copying source layer predict
I0718 16:52:57.691529 1052 net.cpp:322] Copying source layer loss
E0718 16:52:57.796006 1052 extract_image_features.cpp:72] Extracting features for 1 batches
I0718 16:52:58.240720 1066 video_segmentation_data_layer.cpp:204] Restarting data prefetching from start.
E0718 16:52:58.391898 1052 extract_image_features.cpp:108] Extracted features of 4 images.
E0718 16:52:58.391916 1052 extract_image_features.cpp:112] Successfully extracted 4 features!

Error in `../CDC/build/tools/extract_image_features.bin': free(): invalid next size (normal): 0x00000000024a41e0 *
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fef1496c7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fef1497537a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fef1497953c]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf8internal28DestroyDefaultRepeatedFieldsEv+0x1f)[0x7fef194958af]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf23ShutdownProtobufLibraryEv+0x8b)[0x7fef19494b3b]
/usr/lib/x86_64-linux-gnu/libmirprotobuf.so.3(+0x20329)[0x7fef0389f329]
/lib64/ld-linux-x86-64.so.2(+0x10de7)[0x7fef20835de7]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7fef1492eff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7fef1492f045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7fef14915837]
../CDC/build/tools/extract_image_features.bin[0x40d579]

label tool

Do you know any convenient temporal annotation tool?Thank you.

issue about detection result

Hi, thanks for your wonderful job!
When I use your pretrained model to test detection results on Thumos14, my map is lower about 4%. I found that my result takes video_test_000051, video_test_001118, video_test_001143 as background. I couldn't find any solution, could you give me suggestions? I'm looking forward to your reply. Thank you very much!

Why is the reproduce per-frame mAP lower?

The mAP value generated by the compute_framelevel_mAP.m is 0.1409, lower than the value 44.4 reported on you paper. If I replace the model in the xfeat.sh from thumos_CDC/convdeconv-TH14_iter_24390 to sports1m_C3D/conv3d_deepnetA_sport1m_iter_1900000, the generated mAP becomes 0.0171. I am confused why the reproduced mAP value significantly lower than the reported value. May you suggest any idea?

The FFmpeg version you used to extract frames?

Hello Zheng,

Could you help let me know your FFmpeg version used to extract frames from the THUMOS14 data set? I got a problem that although I use your command to extract frames in the scnn/run_demo.m
cmd = ['../lib/preprocess/ffmpeg -i ' videodir videoname '.' videotype ' -r ' num2str(framerate) ... ' -f image2 ' framedir videoname '/' '%06d.jpg 2>' framedir 'frame_extract.log'];
, the number extracted is 1 frame less than the yours for some videos. For example, I got only 845 frames from the test_video_0000004.mp4, but yours is 846, according to the filename 000815.bin belonging to the 0000004 video in the prefix.lst file.
As a result, I got an error while running the compute_framelevel_mAP.m because the length of label_test in the 'multi-label-test.mat is different from my prob length (label_test: 1154579 vs prob: 1154473).
I am wondering the problem of fewer extracted frames is caused by different FFmpeg versions. Mine is 3.3.3-1ubuntu1~16.04.york and what is yours?

Best regards,

issue about how to make training data

I fine out that binfile's shape is (4,32,128,171), including seg's shape(3,32,128,171) and label's shape(1,32,128,171). When i making the training data,because it is one-hot label,the label's shape will be (22,32,128,171),and the final binfile's shape is (25,32,128,171).Is it right? And can you share your training lst file? I am confusing in making the train set.
Thank you!

Some question about the training multi-label frames.

First, very thank you for sharing the code.
But i still have some questions about the paper and code:
1.
I found that some frames in THUMOS14 validation set have multi label (e.g. CliffDiving and Diving or CricketBowling and CricketShot). And i found the #2 has the same question, and you said you simply treat the the frames belong to diving but not cliffdiving as diving. But how did you treat the CricketBowling and CricketShot？
2.
In your paper, the formula (3), you said the z_n stands for the ground truth class label for the n-th segment. Why is the label not frame-wise but segment-wise? Is it should be z_n(t) ?
3.
In your paper, section 3.4 training data construction, you said only keep windows that have at least one frame belonging to actions. Do the actions class contain the Ambiguous?
4.
In the code for evaluation, THUMOS14/eval/PreFrameLabeling/compute_framelevel_mAP.m, line 19-20:

% remove ambiguous
prob=prob(label_test(:,22)==0,:);
label_test=label_test(label_test(:,22)==0,:);

But I found the variable label_test that from the file multi-label-test.mat, is all zeros in the dimension 22, e.g.max(label_test(:,22))=0, So the code line 19-20 will do nothing. I think the ground truth label you provided should be wrong, exactly their are some ambiguous frames in the test video need to remove.

test code

hello,when I run your code to predict ,and I find the code is based on matlab, where can i get test code on python ?

how to download the source code?

hi,Thank you for you hard working, I'm very interested in your work
In your README said,we can find the source code through: https://bitbucket.org/columbiadvmm/cdc
but when I opened the wibesite, I havn't find the source code of the CDC, and I just found the C3D and the caffe, there isn't some sorce code about the CDC.
Could yo tell me where to download the source code ?
Best wishes
I'm looking forwar for your relply, thank you very much

issue about extracting frames from video

I tried to repeat your experiment on THUMOS14. So , I downloaded the THUMOS14 test dataset and used part of the code from C3D project to extract frames from 213 videos in the test dataset. Then I got 1351825 frames in total, which was different from the number of frames you extracted (around 1157824 from your postprocess codes). Then I used your python code to generate 42347 bin files while yours was 36182. So, I changed the number of mini batches to 10567 and output 42347 features.

I generated my own ground truth lables per frame and run your postprocess codes. finally got 0.1426 map. I found that the probability looked ugly, most of the frames have high probability for background and others do not have high enough probability for every actions even with a low probability for background. could you see what might be the problem? The code I used to extract frames is attached below.

def get_action_video_id(input_dir):
	filenames = []`
	for root,dirs,files in os.walk(input_dir):
		for i in files:
			f = open(input_dir + i)
			for line in f:
				if line[:18] in filenames:
					pass
				else:
					filenames.append(line[:18])
			f.close()
	return filenames

def get_frame_count(video):
    ''' Get frame counts and FPS for a video '''
    cap = cv2.VideoCapture(video)
    if not cap.isOpened():
        print "[Error] video={} can not be opened.".format(video)
        sys.exit(-6)

    # get frame counts
    num_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    fps = cap.get(cv2.CAP_PROP_FPS)

    # in case, fps was not available, use default of 29.97
    if not fps or fps != fps:
        fps = 29.97

    return num_frames, fps

def extract_frames(video, start_frame, frame_dir, num_frames_to_extract=16):
    ''' Extract frames from a video using opencv '''

    # check output directory
    if os.path.isdir(frame_dir):
        print "[Warning] frame_dir={} does exist. Will overwrite".format(frame_dir)
    else:
        os.makedirs(frame_dir)

    # get number of frames
    cap = cv2.VideoCapture(video)
    if not cap.isOpened():
        print "[Error] video={} can not be opened.".format(video)
        sys.exit(-6)

    # move to start_frame
    cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)

    # grab each frame and save
    for frame_count in range(num_frames_to_extract):
        frame_num = frame_count + start_frame
        print "[Info] Extracting frame num={}".format(frame_num)
        ret, frame = cap.read()
        if not ret:
            print "[Error] Frame extraction was not successful"
            sys.exit(-7)

        frame_file = os.path.join(
                frame_dir,
                '{0:06d}.jpg'.format(frame_num)
                )
        cv2.imwrite(frame_file, frame)

    return

def main():
	input_annotations_dir = '/home/rusu5516/TH14_Temporal_Annotations_Test/annotations/annotation/'
	filenames = get_action_video_id(input_annotations_dir)
	input_videos_dir = '/home/rusu5516/TH14_test_set_mp4/'
	for file in filenames:
        for root,dirs,files in os.walk(os.path.join(input_videos_dir,'all_frames_pervideo')):
            for i in files:
                if file == i:
                    pass
                else:
                    num_frames, fps = get_frame_count(os.path.join(input_videos_dir,file+'.mp4'))
                    os.mkdir(os.path.join(input_videos_dir,'all_frames_pervideo',file))
                    extract_frames(input_videos_dir+file+'.mp4', 0, input_videos_dir+'all_frames_pervideo/'+file, num_frames)

if __name__ == '__main__':
    main()

2 Bugs need to fixed

Thank you for share your code, it's a really good work. But there are some problems in your project.

In frame_wise_softmax_loss_layer.cpp:86, you want to take into account ambiguous frame, that works fine on the dataset with ambiguous. But if I want to use this project on my data, that will be confused. I think it's better to ignore ambiguous in preprocess, not in caffe layer.
During my training, I found that some -nan appears randomly, even I set my lr=1e-9. And I find that there is a typo in voxel_wise_softmax_layer.cpp:66.
It should be max(scale_data[scale_.offset(i,0,l,h,w)],
since the bottom[0] and the scale_ have different shape. It may subtract a huge number in the next step, and -nan appears.

Some question about the paper

Hi Zheng,
I am Ke Yang from NUDT， I send you an e-mail. But the e-mail server told me that the delivery of e-mail failed, so I post my email here:
I am very sorry to bother you again. I want to ask you some questions about some details in your paper.
In your CVPR 2017 paper titled "CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos":

The result of per-frame labeling in Table 1, the AP is calculated on the ~410,000
action frames or ~1,300,000 frames include all the background frames?
In section 3.4 "Optimization" you wrote "4 training epochs (within half a day) on THUMOS’14 with 48, 780 training windows."
But when no overlapping and excluding the segments that have only background frames , there are only 20000+ segments left.
And could you please send me a list of 20 action APs result of per-frame labeling?

Thank you very much in advance!

Wish you a good day!
Ke Yang
NUDT
2017/03/28