Giter Site home page Giter Site logo

soczech / transnetv2 Goto Github PK

View Code? Open in Web Editor NEW
422.0 9.0 84.0 254 KB

TransNet V2: Shot Boundary Detection Neural Network

Home Page: https://arxiv.org/abs/2008.04838

License: MIT License

Python 99.33% Dockerfile 0.59% PureBasic 0.08%
shot-boundary-detection shot-detection

transnetv2's People

Contributors

soczech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transnetv2's Issues

Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument

Hi Tomas,

Awesome work you done here. Really appreciate the work done.

I had followed your instructions. I had setup a conda environment with python=3.6, tensorflow=2.1, pytorch=1.7.1, cudatoolkit=10.1.

This following error occurs

ValueError: Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument if there is more than one MetaGraph. Got 'tags=None', but there are 0 MetaGraphs in the SavedModel with tag sets []. Pass a 'tags=' argument to load this SavedModel.

when I tried to run

python transnetv2.py some_video.mp4

as well as

python convert_weights.py

I would love to hear from you on how to resolve this error.

Thank you again!

About weight download

Is there a link where I can manually download the weight, because this weight always times out when my server pulls it. I look forward to hearing from you soon. Thank you very much.

Working with longer dissolves and 60fps videos?

Hi Tomas,

Thank you so much for putting this code online. You guys did an excellent job on a model that is still considered a SOTA two years after it was posted!

I'm testing the model on a certain dataset and wanted to ask a few questions:

  1. Are you able to share the .h5 file of your training? I want to try to finetune the model with my data and see if it can help me in getting better accuracy compared to training only with my dataset.
  2. I'm working on 60fps and sometimes my dissolves tend to get to ~70 frames. What would you suggest I try to change if I want the model to augment with longer transitions? Or generally, would you suggest any chance for higher fps videos and transitions?

I'll really appreciate your help! Happy holidays!

about gin

although i installed gin successfully, i still can not import gin, do you know why?
image

How to configure the transnetv2.gin and reproduce the F1 of 77.9 on the ClipShots test set?

Hi Tomáš,

Your work helps me a lot in understanding your great paper! Thank you so much!

I download the ClipShots training and training transitions datasets, and process them according to https://github.com/soCzech/TransNetV2/blob/master/training/consolidate_datasets.py and https://github.com/soCzech/TransNetV2/blob/master/training/create_dataset.py

I download the ClipShots test dataset and process it accordingly.

I also download the IACC.3 dataset and process it with the type of "train" .

I add the ClipShots training, training transitions and IACC.3 in the "options.trn_files" of https://github.com/soCzech/TransNetV2/blob/master/configs/transnetv2.gin, and add ClipShots test in the "options.tst_files". I also change "options.n_epochs" to 50 as indicated in the paper.

However, I can only obtain F1 of 0.74. Could you please give more training details and instructions on how to reproduce 77.9 on the test set?

What are the meanings of file names in "options.tst_files" and how to generate these files?

I also use the pretrained weights in https://github.com/soCzech/TransNetV2/tree/master/inference/transnetv2-weights to test the ClipShots test dataset by revising "options.restore" and "options.test_only" to True in https://github.com/soCzech/TransNetV2/blob/master/configs/transnetv2.gin. I can only get F1 of 0.2545 and cannot reproduce 77.9.

I appreciate your great help so much!

Wentao

Cannot parse file b'/TransNetV2/inference/transnetv2-weights/saved_model.pb': Error parsing message.

Hi, I tried to use the TransNetV2. I followed with the steps

from transnetv2 import TransNetV2

# location of learned weights is automatically inferred
# add argument model_dir="/path/to/transnetv2-weights/" to TransNetV2() if it fails
model = TransNetV2()
video_frames, single_frame_predictions, all_frame_predictions = \
    model.predict_video("video.mp4")

But it shows the error as title, cannot parse the .pb file. Am I missing something? Could you please help with that? Thank you.

key-frame extraction

It is a great job. I could leave a question, how could I extract key-frames meanwhile extracting the caption of this frame, or how could the net get the time of key-frames, then find the caption?

ffmpeg._run.Error

Traceback (most recent call last):
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 193, in <module>
    main()
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 173, in main
    model.predict_video(file)
  File "/home/tom/projects/Studium/Studienarbeit/cutting/TransNetV2/inference/transnetv2.py", line 83, in predict_video
    video_stream, err = ffmpeg.input(video_fn).output(
  File "/home/tom/projects/Studium/Studienarbeit/cutting/env/lib/python3.9/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

I get a ffmpeg error when I run the following command:

python transnetv2.py /mnt/e/Studium/Studienarbeit/Videos/2021/reg/17/c597512d-b37c-11eb-ba8a-ecb6fe06b3b0/highlightsVideo/video/video.mp4 [--visualize]

Thats my ffmpeg:

ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --

incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample -

-enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-

libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-

libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --

enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --

enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --

enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 

--enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-

libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared


Is there any problem with my ffmpeg build?

Weights Loading

Thank you for the repo! I would like to compare TransNetV2 with TransNet with some videos, however, I have an issue while loading the weights. As described in inference/README.md, I just run:

python transnetv2.py test.mp4

I get the following errors that start at line transnetv2.py:17:

    saved_model.ParseFromString(file_content)
google.protobuf.message.DecodeError: Error parsing message

I've noted the size of transnetv2-weights is only 8.4k, is that correct?

how to generate .npy data for evaluation?

hello, in evaluate.py, files = glob.glob(os.path.join(args.directory, "*.npy")), we need to transform mp4 to npy? How to generate npy data? looking forward to your reply, thanks!

ValueError

File "/home/y202202005/workspace/TransNetV2/inference/transnetv2.py", line 165, in main
model = TransNetV2(args.weights)
File "/home/y202202005/workspace/TransNetV2/inference/transnetv2.py", line 18, in init
self._model = tf.saved_model.load(model_dir)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 603, in load
return load_internal(export_dir, tags, options)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 649, in load_internal
root = load_v1_in_v2.load(export_dir, tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 263, in load
return loader.load(tags=tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 188, in load
meta_graph_def = self.get_meta_graph_def_from_tags(tags)
File "/home/y202202005/.conda/envs/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 76, in get_meta_graph_def_from_tags
raise ValueError(
ValueError: Importing a SavedModel with tf.saved_model.load requires a 'tags=' argument if there is more than one MetaGraph. Got 'tags=None', but there are 0 MetaGraphs in the SavedModel with tag sets []. Pass a 'tags=' argument to load this SavedModel.

DecodeError: Error parsing message

Hello, I'm trying to run the model but I get the following error

020-06-22 00:20:45.880699: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[TransNetV2] Using weights from transnetv2-weights/.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 98, in parse_saved_model
    saved_model.ParseFromString(file_content)
google.protobuf.message.DecodeError: Error parsing message

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transnetv2.py", line 188, in <module>
    main()
  File "transnetv2.py", line 160, in main
    model = TransNetV2(args.weights)
  File "transnetv2.py", line 17, in __init__
    self._model = tf.saved_model.load(model_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 578, in load
    return load_internal(export_dir, tags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/load.py", line 588, in load_internal
    loader_impl.parse_saved_model_with_debug_info(export_dir))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 56, in parse_saved_model_with_debug_info
    saved_model = _parse_saved_model(export_dir)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 101, in parse_saved_model
    raise IOError("Cannot parse file %s: %s." % (path_to_pb, str(e)))
OSError: Cannot parse file b'transnetv2-weights/saved_model.pb': Error parsing message.

Any idea on what could be wrong? thanks

pytorch weight

Hi, can you provide converted pytorch weight. Tensorflow 2.1 is not available now.

About weight transfer

The open-source model is saved in pb format, and the training model is saved in hd5 format. How is the hd5 format converted to the pb format? Is it saved directly to pb format during training, or is it converted afterward?

path to scenes gt for -mapping in create_train_dataset

Hi,
First of all, thank you very much for this work.
I have a question that may be a bit silly, but I couldn't find an answer. Indeed, I would like to run the code corresponding to create_train_dataset here , however I can't understand how to fill the mapping_fn parameter : could you tell me what is /path/to/scenes/gt please ? :) I really don't have any idea which file to put...

OpError: not an sstable (bad magic number)

Hi, I tried to run the code as follows.

from transnetv2 import TransNetV2
model = TransNetV2()

For the first time, it shows parse error. I redownload the .pb model. And it show the OpError: not an sstable (bad magic number). Not sure what happened.

I also tried TransNet https://github.com/soCzech/TransNet. It works pretty well. And I like the visualization. I wonder could you please save the TransNet and the weights as a .pb model? Since I want to use it with opencv and c++. Do you have any suggestions that how I can use it with opencv dnn? Suppose I have a video, what should I do to prepare the input for the model? Thank you very much.

training.py can't load models.py

Hi,

Thank you so much for putting this model out. Excellent work!
I'm trying to train the model and I stumble upon a problem, that points out to the models.py file
When I try to run training.py with the gin file, this is the error message I get:

Traceback (most recent call last):
File "C:\videoseg\TransNetV2\training\training.py", line 10, in
import models
File "C:\videoseg\TransNetV2\training\models.py", line 168, in
@gin.configurable(blacklist=["name"])
TypeError: configurable() got an unexpected keyword argument 'blacklist'

Do you have a sense of what could be the problem?
Thanks!

extract frames with captions

First,this work is great. I want to ask for how can I extract keyframe from video with the caption of this frame?

Publish to pypi

Hi @soCzech

Thank you for creating this repo. Do you think of publishing this library to Pypi?

Wenbing

OSError: Cannot parse file b'./saved_model.pb': Error parsing message.

after download all the file and trained model/weights manually, I run the transnetv2.py in the inference, but raise an exception:
OSError: Cannot parse file b'/path/to/saved_model.pb': Error parsing message.
the version of tensorflow is v2.1.0.

and I also run the docker command to build an image flow the readme.md in inference, but after I built the dockerfile successfully, I run the command in the readme.md to test a video, but still raise the same exception.

how can I solve this problem? thanks

Error when training with IACC.3 dataset

Hi,

I have successfully trained models on clipshot, RAI and BBC datasets. However, when I train the model on the IACC.3 dataset, I keep getting the error below,
P_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index -1 of dimension 0 out of bounds

finally, the program ended with the error below.
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_parse_train_sample_529}} {{function_node __inference_parse_train_sample_529}} slice index -1 of dimension 0 out of bounds.
[[{{node StatefulPartitionedCall/cond_3/then/_28/cond_3/PartitionedCall/strided_slice_9}}]]
[[StatefulPartitionedCall]] [Op:IteratorGetNext]

As instructed in the Readme (same as other datasets), after downloading the dataset, running 'consoliate_datasets.py', I created tfrecord with the 'train' version of 'create_dataset.py'. After that, a error arises when training.

can you help? What did I do wrong?

Hyperparameters to tune?

Hi,

Thanks for your amazing work!
I'm trying to use your code for a small project I'm working on, trying to detect scene endings in sports.
I have a decent size dataset, yet I can't get past the 80% F1 score.
I was wondering, what model hyperparameters would you suggest I try playing with?

Thank you so much!

PyTorch implementation is way slower than Tensorflow

Hi!

First of all, thanks for the great work!
I have compared both implementations (on CPU) and Torch is much slower.

I unfortunately have to use Torch so if you have any idea on how to make it faster?

Thanks

Inference on different image sizes

Thanks for this repo @soCzech, I've been using this for some personal projects and it's incredibly performant for all kinds of videos.

I was wondering if you at any point tried using images of different sizes? I've been wanting to try this for some edge cases that fail with the current settings, however it's not quite as trivial as changing the image size variable - so I was wondering if you might have any advice on how one would do this?

The training detail of the inference model

Hi, thank you for the comprehensive repo.
Maybe I have missed something and I have a little question about the released inference model. What is the training detail of this model? Is it based on ClipShots, BBC, or RAI?

What's the difference between the single_frame_predictions and all_frame_predictions?

video_frames, single_frame_predictions, all_frame_predictions = \ model.predict_video("test.mp4")
I see in function def predictions_to_scenes(), it use the single frame predictions. The single frame predictions has the same shape with all_frame_predictions. We have already predict the frame is whether or not the shot boundry, Why we also want to know the all keyframes predcitions ? is it same?

Thank you for your contribution!

error : Pytorch inference

I try to inference with Pytorch.
Tensorflow 2.1 is not available now.
So, I installed tensorflow 2.7 and run programs.
Then I got the following error.

tensorflow.python.framework.errors_impl.OpError: not an sstable (bad magic number)

How to get synthesis dataset

Hi,
in paper, you mentioned that the synthesis data is used and boost performance. But, I didn't find the code to render transitions.
Could you please provide the code to render transitions.

fps

Did you set a specific fps to extract frames of each video? I found you use the original fps of each video in the code.
How the difference of fps between videos affect the results?

PIL error

hello, I'm running consolidate_datasets.py and got these errors, could you please tell me what cause them? Thank you sooo much.
File "/Users/z/Desktop/transnetv2/consolidate_datasets.py", line 215, in
clipshots_dataset(CLIPSHOTS_TRN_txt_files, CLIPSHOTS_TRN_mp4_files, CLIPSHOTS_TRN_target_dir)
File "/Users/z/Desktop/transnetv2/consolidate_datasets.py", line 208, in clipshots_dataset
visualize_scenes(video, scenes).save(save_to + ".png")
File "/Users/z/Desktop/transnetv2/visualization_utils.py", line 55, in visualize_scenes
draw_end_frame(end)
File "/Users/z/Desktop/transnetv2/visualization_utils.py", line 33, in draw_end_frame
draw.rectangle([(w * iw + iw - 1, h * ih), (w * iw + iw - 3, h * ih + ih - 1)], fill=(255, 0, 0))
File "/Users/z/opt/anaconda3/lib/python3.9/site-packages/PIL/ImageDraw.py", line 292, in rectangle
self.draw.draw_rectangle(xy, fill, 1)
ValueError: x1 must be greater than or equal to x0

one_hot vs many_hot

Hey

In the create_dataset.py, there is a function named scenes2zero_one_representation. It returns two values, which based on your paper, related to the networks heads. In the implementation, they are called one_hot and many_hot.
I run the function for 100,000 times with different scenes sequences (that are generated randomly), and in all cases the returned values for both items were the same! I'm wondering if there is a point to set different names for these values? Or maybe there is a subtle difference I wasn't able to spot.

BTW, here is the code I tested the function with:

import numpy as np

from create_dataset import scenes2zero_one_representation

# create some random sequences
sequences, max_len = [], []
for num_seq in range(4):
    cursor = 0
    sequences.append([])
    for i in range(np.random.randint(10, 15)):
        run_len = np.random.randint(1, 100)
        sequences[-1].append([cursor, cursor + run_len])
        cursor += run_len + 1
    max_len.append(cursor)

# get result of the function
results = [scenes2zero_one_representation(s, m) for s, m in zip(sequences, max_len)]

# check if one_hot and many_hot vectors are different
if all([all(result[0] == result[1]) for result in results]):
    print('All values are the same!')
else:
    print('There is some difference.')

Regards

TypeError: a bytes-like object is required, not 'tuple'

Hi,

I'm attempting to run a test with "python transnetv2.py test.mp4" from the inference folder and getting this error.

[TransNetV2] Extracting frames from test.mp4
Traceback (most recent call last):
  File "transnetv2.py", line 192, in <module>
    main()
  File "transnetv2.py", line 172, in main
    model.predict_video(file)
  File "transnetv2.py", line 86, in predict_video
    video = np.frombuffer(video_stream, np.uint8).reshape([-1, 27, 48, 3])
TypeError: a bytes-like object is required, not 'tuple'

Not sure where to go from here, let me know if anyone has any ideas.

Cheers

Can I get pretrained weights (.h5) file?

I've tested inference by using pb file you provided. It worked well. Thanks for the repo :)
And then I tried to run evaluation code but couldn't run 'cause the format of model is different.
Can I get pretrained weights (.h5) file for my research?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.