soczech / transnetv2 Goto Github PK

View Code? Open in Web Editor NEW

422.0 9.0 84.0 254 KB

TransNet V2: Shot Boundary Detection Neural Network

Home Page: https://arxiv.org/abs/2008.04838

License: MIT License

Python 99.33% Dockerfile 0.59% PureBasic 0.08%

shot-boundary-detection shot-detection

transnetv2's People

Contributors

Stargazers

Watchers

Forkers

wansuiye09 ilovebob juggernaut93 jganzabal daoramey wy192 liuguoyou 5idaidai kenshiro-o perfectfeng smiletm happyxy zlh1992 dyf-ai bikekoala carpedkm shiny-red-apple trendingtechnology hwzp99 jason-rayles-nbcuni dataminingdidiyr clotyxf uygnef ssattari kujin66 funykatebird blzheng goldlee clks-wzz jeremyyoder delldu comvislab mika-m-shaw leopoldlin dengbuqi xlsean ignasinou truongchau2602 anormalman12 sunholee0127 rohitdutta2510 aag1992 c0ffymachyne mace-cream jongwookyi liu4lin rsomani95 fsd7 insanityby albort-z speedy0526 hungsvdut2k2 guyuex duckheada aleksandarbuk pa4y fedral dwhnicholas m0baxter paperwave sariyevsamir avsriniv morebrainsmorecontributions zhaopufeng piatrouskiim 1179021477 simon-minami remosy ljwdust 18756164789 misback ykwongaq iwasnotyet jiaxinwang techthiyanes zhanwenchen namzakku yytanggit

transnetv2's Issues

Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument

Hi Tomas,

Awesome work you done here. Really appreciate the work done.

I had followed your instructions. I had setup a conda environment with python=3.6, tensorflow=2.1, pytorch=1.7.1, cudatoolkit=10.1.

This following error occurs

ValueError: Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument if there is more than one MetaGraph. Got 'tags=None', but there are 0 MetaGraphs in the SavedModel with tag sets []. Pass a 'tags=' argument to load this SavedModel.

when I tried to run

python transnetv2.py some_video.mp4

as well as

python convert_weights.py

I would love to hear from you on how to resolve this error.

Thank you again!

About weight download

Is there a link where I can manually download the weight, because this weight always times out when my server pulls it. I look forward to hearing from you soon. Thank you very much.

Working with longer dissolves and 60fps videos?

Hi Tomas,

Thank you so much for putting this code online. You guys did an excellent job on a model that is still considered a SOTA two years after it was posted!

I'm testing the model on a certain dataset and wanted to ask a few questions:

Are you able to share the .h5 file of your training? I want to try to finetune the model with my data and see if it can help me in getting better accuracy compared to training only with my dataset.
I'm working on 60fps and sometimes my dissolves tend to get to ~70 frames. What would you suggest I try to change if I want the model to augment with longer transitions? Or generally, would you suggest any chance for higher fps videos and transitions?

I'll really appreciate your help! Happy holidays!

How can you convert .h5 to saved model format?

Hi,
I've trained with clipshots dataset and got the weights-30.h5 file.
How could you convert this .h5 weight file to saved model format?

about gin

although i installed gin successfully, i still can not import gin, do you know why?

How to configure the transnetv2.gin and reproduce the F1 of 77.9 on the ClipShots test set?

Hi Tomáš,

Your work helps me a lot in understanding your great paper! Thank you so much!

I download the ClipShots training and training transitions datasets, and process them according to https://github.com/soCzech/TransNetV2/blob/master/training/consolidate_datasets.py and https://github.com/soCzech/TransNetV2/blob/master/training/create_dataset.py

I download the ClipShots test dataset and process it accordingly.

I also download the IACC.3 dataset and process it with the type of "train" .

I add the ClipShots training, training transitions and IACC.3 in the "options.trn_files" of https://github.com/soCzech/TransNetV2/blob/master/configs/transnetv2.gin, and add ClipShots test in the "options.tst_files". I also change "options.n_epochs" to 50 as indicated in the paper.

However, I can only obtain F1 of 0.74. Could you please give more training details and instructions on how to reproduce 77.9 on the test set?

What are the meanings of file names in "options.tst_files" and how to generate these files?

I also use the pretrained weights in https://github.com/soCzech/TransNetV2/tree/master/inference/transnetv2-weights to test the ClipShots test dataset by revising "options.restore" and "options.test_only" to True in https://github.com/soCzech/TransNetV2/blob/master/configs/transnetv2.gin. I can only get F1 of 0.2545 and cannot reproduce 77.9.

I appreciate your great help so much!

Wentao

Cannot parse file b'/TransNetV2/inference/transnetv2-weights/saved_model.pb': Error parsing message.

Hi, I tried to use the TransNetV2. I followed with the steps

from transnetv2 import TransNetV2

# location of learned weights is automatically inferred
# add argument model_dir="/path/to/transnetv2-weights/" to TransNetV2() if it fails
model = TransNetV2()
video_frames, single_frame_predictions, all_frame_predictions = \
    model.predict_video("video.mp4")

But it shows the error as title, cannot parse the .pb file. Am I missing something? Could you please help with that? Thank you.

key-frame extraction

It is a great job. I could leave a question, how could I extract key-frames meanwhile extracting the caption of this frame, or how could the net get the time of key-frames, then find the caption?

the output frame index is discontinuous

Hello, thank you for sharing the great work. when I test it with my own video, I find there are some discontinuous in the frame index like this.

ffmpeg._run.Error

Traceback (most recent call last):
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 193, in <module>
    main()
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 173, in main
    model.predict_video(file)
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 83, in predict_video
    video_stream, err = ffmpeg.input(video_fn).output(
  File "/home/tom/projects/Studium/Studienarbeit/cutting/env/lib/python3.9/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

I get a ffmpeg error when I run the following command:

python transnetv2.py /mnt/e/Studium/Studienarbeit/Videos/2021/reg/17/c597512d-b37c-11eb-ba8a-ecb6fe06b3b0/highlightsVideo/video/video.mp4 [--visualize]

Thats my ffmpeg:

ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --

incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample -

-enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-

libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-

libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --

enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --

enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --

enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 

--enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-

libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared

Is there any problem with my ffmpeg build?

Command-line for shots detection and split in tensorflow or pytorch

I am impressed with this tool.
I want to ask the command line for

mp4 to shots splitting and
mp4 to shots detection
using the tool.

If you can help.

what does different line color mean in the visualization?

Great work!!!
I used it on my own videos, but in the visualization there are both green and blue lines, what do the different colors mean?

Weights Loading

Thank you for the repo! I would like to compare TransNetV2 with TransNet with some videos, however, I have an issue while loading the weights. As described in inference/README.md, I just run:

python transnetv2.py test.mp4

I get the following errors that start at line transnetv2.py:17:

    saved_model.ParseFromString(file_content)
google.protobuf.message.DecodeError: Error parsing message

I've noted the size of transnetv2-weights is only 8.4k, is that correct?

How can I get the dataset？

how to generate .npy data for evaluation?

hello, in evaluate.py, files = glob.glob(os.path.join(args.directory, "*.npy")), we need to transform mp4 to npy? How to generate npy data? looking forward to your reply, thanks!

ValueError

File "/home/y202202005/workspace/TransNetV2/inference/transnetv2.py", line 165, in main
model = TransNetV2(args.weights)
File "/home/y202202005/workspace/TransNetV2/inference/transnetv2.py", line 18, in init
self._model = tf.saved_model.load(model_dir)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 603, in load
return load_internal(export_dir, tags, options)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 649, in load_internal
root = load_v1_in_v2.load(export_dir, tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 263, in load
return loader.load(tags=tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 188, in load
meta_graph_def = self.get_meta_graph_def_from_tags(tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 76, in get_meta_graph_def_from_tags
raise ValueError(
ValueError: Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument if there is more than one MetaGraph. Got 'tags=None', but there are 0 MetaGraphs in the SavedModel with tag sets []. Pass a 'tags=' argument to load this SavedModel.

DecodeError: Error parsing message

Hello, I'm trying to run the model but I get the following error

020-06-22 00:20:45.880699: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[TransNetV2] Using weights from transnetv2-weights/.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 98, in parse_saved_model
    saved_model.ParseFromString(file_content)
google.protobuf.message.DecodeError: Error parsing message

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transnetv2.py", line 188, in <module>
    main()
  File "transnetv2.py", line 160, in main
    model = TransNetV2(args.weights)
  File "transnetv2.py", line 17, in __init__
    self._model = tf.saved_model.load(model_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 578, in load
    return load_internal(export_dir, tags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 588, in load_internal
    loader_impl.parse_saved_model_with_debug_info(export_dir))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 56, in parse_saved_model_with_debug_info
    saved_model = _parse_saved_model(export_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 101, in parse_saved_model
    raise IOError("Cannot parse file %s: %s." % (path_to_pb, str(e)))
OSError: Cannot parse file b'transnetv2-weights/saved_model.pb': Error parsing message.

Any idea on what could be wrong? thanks

pytorch weight

Hi, can you provide converted pytorch weight. Tensorflow 2.1 is not available now.

About weight transfer

The open-source model is saved in pb format, and the training model is saved in hd5 format. How is the hd5 format converted to the pb format? Is it saved directly to pb format during training, or is it converted afterward?

path to scenes gt for -mapping in create_train_dataset

Hi,
First of all, thank you very much for this work.
I have a question that may be a bit silly, but I couldn't find an answer. Indeed, I would like to run the code corresponding to create_train_dataset here , however I can't understand how to fill the mapping_fn parameter : could you tell me what is /path/to/scenes/gt please ? :) I really don't have any idea which file to put...

Get shot boundaries using pytorch version

Hei there,
how can you get the shot boundaries using the pytorch version?

OpError: not an sstable (bad magic number)

Hi, I tried to run the code as follows.

from transnetv2 import TransNetV2
model = TransNetV2()

For the first time, it shows parse error. I redownload the .pb model. And it show the OpError: not an sstable (bad magic number). Not sure what happened.

I also tried TransNet https://github.com/soCzech/TransNet. It works pretty well. And I like the visualization. I wonder could you please save the TransNet and the weights as a .pb model? Since I want to use it with opencv and c++. Do you have any suggestions that how I can use it with opencv dnn? Suppose I have a video, what should I do to prepare the input for the model? Thank you very much.

Question for the 'only_gradual' set in ClipShots and should we use it for training?

Could you please detail the 'only_gradual' set and should we use it for training?

Is there a pytorch implementation of this network structure?

where do you download the original data?

You only told us how to process the data, but where do you download the original data? Not reflected in the code

It's based on pytorch？

training.py can't load models.py

Hi,

Thank you so much for putting this model out. Excellent work!
I'm trying to train the model and I stumble upon a problem, that points out to the models.py file
When I try to run training.py with the gin file, this is the error message I get:

Traceback (most recent call last):
File "C:\videoseg\TransNetV2\training\training.py", line 10, in
import models
File "C:\videoseg\TransNetV2\training\models.py", line 168, in
@gin.configurable(blacklist=["name"])
TypeError: configurable() got an unexpected keyword argument 'blacklist'

Do you have a sense of what could be the problem?
Thanks!

extract frames with captions

First，this work is great. I want to ask for how can I extract keyframe from video with the caption of this frame?

Publish to pypi

Hi @soCzech

Thank you for creating this repo. Do you think of publishing this library to Pypi?

Wenbing

OSError: Cannot parse file b'./saved_model.pb': Error parsing message.

after download all the file and trained model/weights manually, I run the transnetv2.py in the inference, but raise an exception:
OSError: Cannot parse file b'/path/to/saved_model.pb': Error parsing message.
the version of tensorflow is v2.1.0.

and I also run the docker command to build an image flow the readme.md in inference, but after I built the dockerfile successfully, I run the command in the readme.md to test a video, but still raise the same exception.

how can I solve this problem? thanks

Error when training with IACC.3 dataset

Hi,

I have successfully trained models on clipshot, RAI and BBC datasets. However, when I train the model on the IACC.3 dataset, I keep getting the error below,
P_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index -1 of dimension 0 out of bounds

finally, the program ended with the error below.
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_parse_train_sample_529}} {{function_node __inference_parse_train_sample_529}} slice index -1 of dimension 0 out of bounds.
[[{{node StatefulPartitionedCall/cond_3/then/_28/cond_3/PartitionedCall/strided_slice_9}}]]
[[StatefulPartitionedCall]] [Op:IteratorGetNext]

As instructed in the Readme (same as other datasets), after downloading the dataset, running 'consoliate_datasets.py', I created tfrecord with the 'train' version of 'create_dataset.py'. After that, a error arises when training.

can you help? What did I do wrong?

Hyperparameters to tune?

Hi,

Thanks for your amazing work!
I'm trying to use your code for a small project I'm working on, trying to detect scene endings in sports.
I have a decent size dataset, yet I can't get past the 80% F1 score.
I was wondering, what model hyperparameters would you suggest I try playing with?

Thank you so much!

PyTorch implementation is way slower than Tensorflow

Hi!

First of all, thanks for the great work!
I have compared both implementations (on CPU) and Torch is much slower.

I unfortunately have to use Torch so if you have any idea on how to make it faster?

Thanks

Inference on different image sizes

Thanks for this repo @soCzech, I've been using this for some personal projects and it's incredibly performant for all kinds of videos.

I was wondering if you at any point tried using images of different sizes? I've been wanting to try this for some edge cases that fail with the current settings, however it's not quite as trivial as changing the image size variable - so I was wondering if you might have any advice on how one would do this?

where can i download IACC.3 dataset？

Could you please provide the usage of trainning code?

Thanks for your great work,

Could you please provide the usage of trainning code with my own data

I want to detect gradual changes only，should I label one frame or multiple frames？

The training detail of the inference model

Hi, thank you for the comprehensive repo.
Maybe I have missed something and I have a little question about the released inference model. What is the training detail of this model? Is it based on ClipShots, BBC, or RAI?

What's the difference between the single_frame_predictions and all_frame_predictions?

video_frames, single_frame_predictions, all_frame_predictions = \ model.predict_video("test.mp4")
I see in function def predictions_to_scenes(), it use the single frame predictions. The single frame predictions has the same shape with all_frame_predictions. We have already predict the frame is whether or not the shot boundry, Why we also want to know the all keyframes predcitions ? is it same?

Thank you for your contribution!

error : Pytorch inference

I try to inference with Pytorch.
Tensorflow 2.1 is not available now.
So, I installed tensorflow 2.7 and run programs.
Then I got the following error.

tensorflow.python.framework.errors_impl.OpError: not an sstable (bad magic number)

shot annotation of RAI dataset not found

Hi, I download RAI from https://drive.google.com/file/d/1YColUfc3ZuCbiAAHHMYRQVBF2yBikj4N/view, but just found videos and scene annotation. I want to know where do you find the shot boundary annotation. Thanks a lot!

How to get synthesis dataset

Hi,
in paper, you mentioned that the synthesis data is used and boost performance. But, I didn't find the code to render transitions.
Could you please provide the code to render transitions.

fps

Did you set a specific fps to extract frames of each video? I found you use the original fps of each video in the code.
How the difference of fps between videos affect the results?

PIL error

hello, I'm running consolidate_datasets.py and got these errors, could you please tell me what cause them? Thank you sooo much.
File "/Users/z/Desktop/transnetv2/consolidate_datasets.py", line 215, in
clipshots_dataset(CLIPSHOTS_TRN_txt_files, CLIPSHOTS_TRN_mp4_files, CLIPSHOTS_TRN_target_dir)
File "/Users/z/Desktop/transnetv2/consolidate_datasets.py", line 208, in clipshots_dataset
visualize_scenes(video, scenes).save(save_to + ".png")
File "/Users/z/Desktop/transnetv2/visualization_utils.py", line 55, in visualize_scenes
draw_end_frame(end)
File "/Users/z/Desktop/transnetv2/visualization_utils.py", line 33, in draw_end_frame
draw.rectangle([(w * iw + iw - 1, h * ih), (w * iw + iw - 3, h * ih + ih - 1)], fill=(255, 0, 0))
File "/Users/z/opt/anaconda3/lib/python3.9/site-packages/PIL/ImageDraw.py", line 292, in rectangle
self.draw.draw_rectangle(xy, fill, 1)
ValueError: x1 must be greater than or equal to x0

one_hot vs many_hot

Hey

In the create_dataset.py, there is a function named scenes2zero_one_representation. It returns two values, which based on your paper, related to the networks heads. In the implementation, they are called one_hot and many_hot.
I run the function for 100,000 times with different scenes sequences (that are generated randomly), and in all cases the returned values for both items were the same! I'm wondering if there is a point to set different names for these values? Or maybe there is a subtle difference I wasn't able to spot.

BTW, here is the code I tested the function with:

import numpy as np

from create_dataset import scenes2zero_one_representation

# create some random sequences
sequences, max_len = [], []
for num_seq in range(4):
    cursor = 0
    sequences.append([])
    for i in range(np.random.randint(10, 15)):
        run_len = np.random.randint(1, 100)
        sequences[-1].append([cursor, cursor + run_len])
        cursor += run_len + 1
    max_len.append(cursor)

# get result of the function
results = [scenes2zero_one_representation(s, m) for s, m in zip(sequences, max_len)]

# check if one_hot and many_hot vectors are different
if all([all(result[0] == result[1]) for result in results]):
    print('All values are the same!')
else:
    print('There is some difference.')

Regards

TypeError: a bytes-like object is required, not 'tuple'

Hi,

I'm attempting to run a test with "python transnetv2.py test.mp4" from the inference folder and getting this error.

[TransNetV2] Extracting frames from test.mp4
Traceback (most recent call last):
  File "transnetv2.py", line 192, in <module>
    main()
  File "transnetv2.py", line 172, in main
    model.predict_video(file)
  File "transnetv2.py", line 86, in predict_video
    video = np.frombuffer(video_stream, np.uint8).reshape([-1, 27, 48, 3])
TypeError: a bytes-like object is required, not 'tuple'

Not sure where to go from here, let me know if anyone has any ideas.

Cheers

Can I get pretrained weights (.h5) file?

I've tested inference by using pb file you provided. It worked well. Thanks for the repo :)
And then I tried to run evaluation code but couldn't run 'cause the format of model is different.
Can I get pretrained weights (.h5) file for my research?