facebookresearch / pytorchvideo Goto Github PK

View Code? Open in Web Editor NEW

3.3K 156.0 404.0 6.1 MB

A deep learning library for video understanding research.

Home Page: https://pytorchvideo.org/

License: Apache License 2.0

Shell 0.03% Python 94.92% Jupyter Notebook 3.86% JavaScript 0.84% CSS 0.35%

pytorchvideo's People

Contributors

Stargazers

Watchers

Forkers

dumpmemory sailfish009 zhiqwang rangilyu theshadow29 arvind-ideas2it yuzhou164 jamesthesnake huaxu007 ssusantachary ompugao reyadussalahin rushib1403 manick94 ricklentz lyrl mehrdad-shokri wgouyang rezaprimasatya jaynotleno giladsharir cprakashagr kiranvarghesev irvingzhang0512 kniter1 tullie rohitgirdhar linkonbsmrstu aka100kvolt dennis-park-tri wuqiangch misery0424 trigal invett nateraw ckjeter henrywoo tenfleques quangduytran dmitryf-go farabi-shafkat quixoteji rmoin ryputtam bharatr21 balaharivignesh pierrenowi ai-ml-cv fagan2888 ssoheily lintaopeach trendingtechnology lyuji282 prashant118 alighofrani95 husnejahan karthik-pandaram manikant92 sethuramanio nihaarshah t-sumida midnight93 ankitshah009 kp-forks ytorosjan hsouporto jxlim89 wushuangppp chandanpanda jjboy datdao1998 denred0 utopiazh miraodasilva pokameng tuandoan998 tigeryi1998 yukiman118 vikramguptai tianhaofu morizeyao songdo0 stevenjokess santiagoahc varamishitha cheng-fa xdrudis kristianspurling zhang-jr hitersyw zouying-sjtu lihuibng sean-wade dl-ha shardulparab97 imaginary-person appleaper chenkangyang luo980 kevinmtian

pytorchvideo's Issues

Stacking tensors without same size

Hi, I'm following the tutorial Training a PyTorchVideo classification model and I believe I can't load the data correctly.

I'm using Google Colab and my Kinetics400 is in my Google Drive. I've preprocessed the Kinetics such that all the videos are rescaled to height=256 pixels.

My Dataloader is implemented in the same way as described in the tutorial:

class KineticsDataModule(pytorch_lightning.LightningDataModule):
    """
    This LightningDataModule implementation constructs a PyTorchVideo Kinetics dataset for both
    the train and val partitions. It defines each partition's augmentation and
    preprocessing transforms and configures the PyTorch DataLoaders.
    """

    # Dataset configuration
    _DATA_PATH = '/content/drive/MyDrive/Datasets/Kinetics400/'
    _CLIP_DURATION = 2  # Duration of sampled clip for each video
    _BATCH_SIZE = 8
    _NUM_WORKERS = 8  # Number of parallel processes fetching data


    def train_dataloader(self):
        """
        Create the Kinetics train partition from the list of video labels
        in {self._DATA_PATH}/train.csv. Add transform that subsamples and
        normalizes the video before applying the scale, crop and flip augmentations.
        """
        train_transform = Compose(
            [
            ApplyTransformToKey(
              key="video",
              transform=Compose(
                  [
                    UniformTemporalSubsample(8),
                    Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
                    RandomShortSideScale(min_size=256, max_size=320),
                    RandomCrop(244),
                    RandomHorizontalFlip(p=0.5),
                  ]
                ),
              ),
            ]
        )
        train_dataset = pytorchvideo.data.Kinetics(
              data_path=os.path.join(self._DATA_PATH, "train.csv"),
              clip_sampler=pytorchvideo.data.make_clip_sampler("random", self._CLIP_DURATION),
              transform=train_transform
        )
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=self._BATCH_SIZE,
            num_workers=self._NUM_WORKERS,
        )

    def val_dataloader(self):
        """
        Create the Kinetics val partition from the list of video labels
        in {self._DATA_PATH}/val.csv. Add transform that subsamples and
        normalizes the video before applying the scale.
        """
        val_transform = Compose(
            [
            ApplyTransformToKey(
              key="video",
              transform=Compose(
                  [
                    UniformTemporalSubsample(8),
                    Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
                  ]
                ),
              ),
            ]
        )
        val_dataset = pytorchvideo.data.Kinetics(
            data_path=os.path.join(self._DATA_PATH, "val.csv"),
            clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", self._CLIP_DURATION),
            transform=val_transform
        )

        return torch.utils.data.DataLoader(
            val_dataset,
            batch_size=self._BATCH_SIZE,
            num_workers=self._NUM_WORKERS,
        )

I built a default ResNet just like the tutorial. Following the tutorial until the training step, I'm running a cell in Google Colab with only train() to run the function def train().

Even though I'm randomly cropping to 224x224 in Transforms, I'm getting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-2da0ffaf5447> in <module>()
----> 1 train()

13 frames
<ipython-input-6-cd4463cf3c91> in train()
      3   data_module = KineticsDataModule()
      4   trainer = pytorch_lightning.Trainer()
----> 5   trainer.fit(classification_module, data_module)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    456         )
    457 
--> 458         self._run(model)
    459 
    460         assert self.state.stopped

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
    754 
    755         # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 756         self.dispatch()
    757 
    758         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    795             self.accelerator.start_predicting(self)
    796         else:
--> 797             self.accelerator.start_training(self)
    798 
    799     def run_stage(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     94 
     95     def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96         self.training_type_plugin.start_training(trainer)
     97 
     98     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    142     def start_training(self, trainer: 'pl.Trainer') -> None:
    143         # double dispatch to initiate the training loop
--> 144         self._results = trainer.run_stage()
    145 
    146     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
    805         if self.predicting:
    806             return self.run_predict()
--> 807         return self.run_train()
    808 
    809     def _pre_training_routine(self):

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    840             self.progress_bar_callback.disable()
    841 
--> 842         self.run_sanity_check(self.lightning_module)
    843 
    844         self.checkpoint_connector.has_trained = False

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
   1105 
   1106             # run eval step
-> 1107             self.run_evaluation()
   1108 
   1109             self.on_sanity_check_end()

/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, on_epoch)
    947             dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
    948 
--> 949             for batch_idx, batch in enumerate(dataloader):
    950                 if batch is None:
    951                     continue

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
   1197             else:
   1198                 del self._task_info[idx]
-> 1199                 return self._process_data(data)
   1200 
   1201     def _try_put_index(self):

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _process_data(self, data)
   1223         self._try_put_index()
   1224         if isinstance(data, ExceptionWrapper):
-> 1225             data.reraise()
   1226         return data
   1227 

/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
    427             # have message field
    428             raise self.exc_type(message=msg)
--> 429         raise self.exc_type(msg)
    430 
    431 

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 35, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 73, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 73, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 8, 256, 454] at entry 0 and [3, 8, 256, 144] at entry 5

I was expecting something like [3, 8, 244, 244] due to RandomCrop(244) in the DataLoader. What am I missing? Thanks in advance for your help!

Ava Dataset Loader

🚀 Feature

Request: Implementation for AVA Dataset for localized human actions research

Motivation

We're doing some work into action recognition models with localised actions in a frame, using the other facebookresearch slowfast repo. This implementation seems like a better way to go in terms of development, but currently stuck because we have our own dataset formatted on AVA.

Pitch

To be able to train slowfast models using AVA dataset! Would be willing to help in the creation of the DataLoader as well.

Kinetics data loader

I tried to use the tutorial provided in the documentation "Training a PyTorchVideo classification model" using the kinetics dataset.
in the official annotation file I noticed there are 2 columns "time_start" and "time_end" which explains when the activity occurs. I didn't see any mention for that in the LabeledVideoDataset class, you just expect the csv to contain "video_path , label".
so my question is , do you assume the videos are already cropped in the right time ? or you don't care about the frame labels and you feed just a random crop of the video and you assume that the activity is contained in that ?
By the way, the official script for downloading the kinetics dataset doesn't work at list for me(I couldn't initialize new conda env with the environment file provided by them, and even after installing by hand the package the script doesn't do anything).

TypeError: cannot pickle 'torch._C.Generator' object

It throws an exception when I follow the official tutorial to implement a video classification model.

https://pytorchvideo.org/docs/tutorial_classification

Environment:

python version: macOS-10.16-x86_64-i386-64bit
python version: 3.8.5
torch version: 1.8.1
torchvision version: 0.9.1
pytorch_lightning version: 1.2.8
pytorchvideo version: 0.1.0
fvcore version: 0.1.4.post20210326

The code

import os
import pytorch_lightning as pl
import pytorchvideo.data
import torch.utils.data

from pytorchvideo.transforms import (
    ApplyTransformToKey,
    RandomShortSideScale,
    RemoveKey,
    ShortSideScale,
    UniformTemporalSubsample
)

from torchvision.transforms import (
    Compose,
    Normalize,
    RandomCrop,
    RandomHorizontalFlip
)

class KineticsDataModule(pl.LightningDataModule):
    
    def __init__(self):
        super().__init__()
        self.transform = Compose(
            [
            ApplyTransformToKey(
              key="video",
              transform=Compose(
                  [
                    UniformTemporalSubsample(8),
                    Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
                    RandomShortSideScale(min_size=256, max_size=320),
                    RandomCrop(244),
                    RandomHorizontalFlip(p=0.5),
                  ]
                ),
              ),
            ]
        )
        
    def train_dataloader(self):
        train_dataset = pytorchvideo.data.Kinetics(
            data_path=VIDEO_PATH + "/train",
            clip_sampler=pytorchvideo.data.make_clip_sampler("random", 2),
            transform=self.transform,
        )
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=8,
            num_workers=8,
        )

    def val_dataloader(self):
        val_dataset = pytorchvideo.data.Kinetics(
            data_path=VIDEO_PATH + "/valid",
            clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", 2),
            transform=self.transform,
        )
        return torch.utils.data.DataLoader(
            val_dataset,
            batch_size=8,
            num_workers=8,
        )

import pytorchvideo.models.resnet
import torch
import torch.nn as nn
import torch.nn.functional as F

def make_kinetics_resnet():
    return pytorchvideo.models.resnet.create_resnet(
        input_channel=3, 
        model_depth=50, #
        model_num_class=4,
        norm=nn.BatchNorm3d,
        activation=nn.ReLU,
    )

class ClassificationModule(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = make_kinetics_resnet()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        # The model expects a video tensor of shape (B, C, T, H, W), which is the 
        # format provided by the dataset
        y_hat = self.model(batch["video"])

        # Compute cross entropy loss, loss.backwards will be called behind the scenes
        # by PyTorchLightning after being returned from this method.
        loss = F.cross_entropy(y_hat, batch["label"])

        # Log the train loss to Tensorboard
        self.log("train_loss", loss.item())

        return loss

    def validation_step(self, batch, batch_idx):
        y_hat = self.model(batch["video"])
        loss = F.cross_entropy(y_hat, batch["label"])
        self.log("val_loss", loss)
        return loss

    def configure_optimizers(self):
        """
        Setup the Adam optimizer. Note, that this function also can return a lr scheduler, which is
        usually useful for training video models.
        """
        return torch.optim.Adam(self.parameters(), lr=1e-1)

classification_module = ClassificationModule()
data_module = KineticsDataModule()
trainer = pl.Trainer()
trainer.fit(classification_module, datamodule=data_module)

The full log:

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | Net  | 31.7 M
-------------------------------
31.7 M    Trainable params
0         Non-trainable params
31.7 M    Total params
126.646   Total estimated model params size (MB)
Validation sanity check: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-a7ab4758bd42> in <module>
      2 data_module = KineticsDataModule()
      3 trainer = pl.Trainer()
----> 4 trainer.fit(classification_module, datamodule=data_module)

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    497 
    498         # dispath `start_training` or `start_testing` or `start_predicting`
--> 499         self.dispatch()
    500 
    501         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    544 
    545         else:
--> 546             self.accelerator.start_training(self)
    547 
    548     def train_or_test_or_predict(self):

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     71 
     72     def start_training(self, trainer):
---> 73         self.training_type_plugin.start_training(trainer)
     74 
     75     def start_testing(self, trainer):

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    112     def start_training(self, trainer: 'Trainer') -> None:
    113         # double dispatch to initiate the training loop
--> 114         self._results = trainer.run_train()
    115 
    116     def start_testing(self, trainer: 'Trainer') -> None:

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    605             self.progress_bar_callback.disable()
    606 
--> 607         self.run_sanity_check(self.lightning_module)
    608 
    609         # set stage for logging

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
    862 
    863             # run eval step
--> 864             _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
    865 
    866             self.on_sanity_check_end()

~/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, max_batches, on_epoch)
    711             dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
    712 
--> 713             for batch_idx, batch in enumerate(dataloader):
    714                 if batch is None:
    715                     continue

~/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __iter__(self)
    353             return self._iterator
    354         else:
--> 355             return self._get_iterator()
    356 
    357     @property

~/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _get_iterator(self)
    299         else:
    300             self.check_worker_number_rationality()
--> 301             return _MultiProcessingDataLoaderIter(self)
    302 
    303     @property

~/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
    912             #     before it starts, and __del__ tries to join but will get:
    913             #     AssertionError: can only join a started process.
--> 914             w.start()
    915             self._index_queues.append(index_queue)
    916             self._workers.append(w)

~/opt/anaconda3/lib/python3.8/multiprocessing/process.py in start(self)
    119                'daemonic processes are not allowed to have children'
    120         _cleanup()
--> 121         self._popen = self._Popen(self)
    122         self._sentinel = self._popen.sentinel
    123         # Avoid a refcycle if the target function holds an indirect

~/opt/anaconda3/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    222     @staticmethod
    223     def _Popen(process_obj):
--> 224         return _default_context.get_context().Process._Popen(process_obj)
    225 
    226 class DefaultContext(BaseContext):

~/opt/anaconda3/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
    282         def _Popen(process_obj):
    283             from .popen_spawn_posix import Popen
--> 284             return Popen(process_obj)
    285 
    286     class ForkServerProcess(process.BaseProcess):

~/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py in __init__(self, process_obj)
     30     def __init__(self, process_obj):
     31         self._fds = []
---> 32         super().__init__(process_obj)
     33 
     34     def duplicate_for_child(self, fd):

~/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
     17         self.returncode = None
     18         self.finalizer = None
---> 19         self._launch(process_obj)
     20 
     21     def duplicate_for_child(self, fd):

~/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
     45         try:
     46             reduction.dump(prep_data, fp)
---> 47             reduction.dump(process_obj, fp)
     48         finally:
     49             set_spawning_popen(None)

~/opt/anaconda3/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

TypeError: cannot pickle 'torch._C.Generator' object

Can not install pytorchvideo

When I use the follow step to install pytorchvideo:

3. Install from a local clone

git clone https://github.com/facebookresearch/pytorchvideo.git
cd pytorchvideo
pip install -e .

# For developing and testing
pip install -e . [test,dev]

I had got the follow error, How can I resolve this problem?

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///home/tccmedia/github/pytorchvideo
Collecting fvcore
  Downloading fvcore-0.1.5.post20210630.tar.gz (49 kB)
     |████████████████████████████████| 49 kB 3.2 MB/s 
Collecting av
  Downloading av-8.0.3-cp36-cp36m-manylinux2010_x86_64.whl (37.2 MB)
     |████████████████████████████████| 37.2 MB 6.7 MB/s 
Collecting parameterized
  Downloading parameterized-0.8.1-py2.py3-none-any.whl (26 kB)
Collecting iopath
  Downloading iopath-0.1.9-py3-none-any.whl (27 kB)
ERROR: Package 'pytorchvideo' requires a different Python: 3.6.9 not in '>=3.7'

Enable `pretrained=True` with different num_classes for head.

🚀 Feature

NOTE: Please look at the existing list of Issues tagged with the label 'enhancement`. Only open a new issue if you do not see your feature request there.

Currently, the model available don't support pretrained=True and num_classes={something_different_than_orignal_model} as load_state_dict is using load_state_dict=True

Motivation

Pitch

NOTE: we only consider adding new features if they are useful for many users.

PyTorch Lightning Flash Integration

🚀 Feature

NOTE: Please look at the existing list of Issues tagged with the label 'enhancement`. Only open a new issue if you do not see your feature request there.

Motivation

Dear people from PyTorchVideo,

First, congratulations on releasing this framework. It is fabulous !

We plan to integrate this framework within Lightning Flash.

We recently created a new BETA data processing API called DataPipeline. It makes Dataset obsolete and enable very thin customization and quick data augmentation experimentation.

It is built out of Preprocess and Postprocess with multiple hooks to override. It aims at bridging the skew between training / serving.

Here is the tutorial: https://lightning-flash.readthedocs.io/en/latest/custom_task.html

Here is the doc about it: https://lightning-flash.readthedocs.io/en/latest/general/data.html

Here is the Lightning Flash GitHub: https://github.com/PytorchLightning/lightning-flash

We are really keen to collaborate to make this framework integrated within Lightning Flash.

Small nits for the documentation:

PyTorch Lightning provides quantization as a Callback: https://pytorch-lightning.readthedocs.io/en/stable/advanced/pruning_quantization.html?highlight=Quantization#quantization
PyTorch Lightning help with jit export: https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.core.lightning.html?highlight=to_jit#pytorch_lightning.core.lightning.LightningModule.to_torchscript
LightningModule provides a load_from_checkpoint function which will download automatically from url and doesn't require to instantiate the model. It relies on save_hyperparameters internally.
Here: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#save-hyperparameters
This should work.

model = EfficientX3d.load_from_checkpoint('https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/efficient_x3d_xs_original_form.pyth')

Best,
Thomas Chaton.

Pitch

NOTE: we only consider adding new features if they are useful for many users.

How to fine tune Slowfast Kinetics model

Hi,

I want to train the slowfast model SLOWFAST_8x8_R50 with my own dataset ( 9 classes of actions) , could you provide Fine/tune code?
thanks alot.

It throws an exception when training the model and loading the dataset from a directory.

If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:

🐛 Bugs / Unexpected behaviors

It throws an exception when I load the dataset from a directory by using Kinetics dataset in the pytorchvideo data module, and the directory format already was the following.
dir_path/<class_name>/<video_name>.mp4

../input/kinetics400partial/valid
├── blowing_glass
│   ├── ****.mp4
│   ├── ****.mp4
│   ├── ****.mp4
│   └── .....
├── long_jump
│   ├── ****.mp4
│   ├── ****.mp4
|   └── .....

Instructions To Reproduce the Issue:

Please include the following (depending on what the issue is):
The code of the data module as following.

class KineticsDataModule(pl.LightningDataModule):
    
    def __init__(self):
        super().__init__()
        self.transform = Compose(
            [
            ApplyTransformToKey(
              key="video",
              transform=Compose(
                  [
                    UniformTemporalSubsample(8),
                    Normalize((0.45, 0.45, 0.45), (0.225, 0.225, 0.225)),
                    RandomShortSideScale(min_size=256, max_size=320),
                    RandomCrop(244),
                    RandomHorizontalFlip(p=0.5),
                  ]
                ),
              ),
            ]
        )
        
    def train_dataloader(self):
        train_dataset = pytorchvideo.data.Kinetics(
            data_path="../input/kinetics400partial/train",
            clip_sampler=pytorchvideo.data.make_clip_sampler("random", 2),
            transform=self.transform,
        )
        return torch.utils.data.DataLoader(
            train_dataset,
            batch_size=8,
            num_workers=8,
        )

    def val_dataloader(self):
        val_dataset = pytorchvideo.data.Kinetics(
            data_path="../input/kinetics400partial/valid",
            clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", 2),
            transform=self.transform,
        )
        return torch.utils.data.DataLoader(
            val_dataset,
            batch_size=8,
            num_workers=8,
        )

and the full log.

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-a7ab4758bd42> in <module>
      2 data_module = KineticsDataModule()
      3 trainer = pl.Trainer()
----> 4 trainer.fit(classification_module, datamodule=data_module)

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    497 
    498         # dispath `start_training` or `start_testing` or `start_predicting`
--> 499         self.dispatch()
    500 
    501         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    544 
    545         else:
--> 546             self.accelerator.start_training(self)
    547 
    548     def train_or_test_or_predict(self):

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     71 
     72     def start_training(self, trainer):
---> 73         self.training_type_plugin.start_training(trainer)
     74 
     75     def start_testing(self, trainer):

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    112     def start_training(self, trainer: 'Trainer') -> None:
    113         # double dispatch to initiate the training loop
--> 114         self._results = trainer.run_train()
    115 
    116     def start_testing(self, trainer: 'Trainer') -> None:

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    605             self.progress_bar_callback.disable()
    606 
--> 607         self.run_sanity_check(self.lightning_module)
    608 
    609         # set stage for logging

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
    844         # to make sure program won't crash during val
    845         if should_sanity_check:
--> 846             self.reset_val_dataloader(ref_model)
    847             self.num_sanity_val_batches = [
    848                 min(self.num_sanity_val_steps, val_batches) for val_batches in self.num_val_batches

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py in reset_val_dataloader(self, model)
    362         has_step = is_overridden('validation_step', model)
    363         if has_loader and has_step:
--> 364             self.num_val_batches, self.val_dataloaders = self._reset_eval_dataloader(model, 'val')
    365 
    366     def reset_test_dataloader(self, model) -> None:

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py in _reset_eval_dataloader(self, model, mode)
    276         # always get the loaders first so we can count how many there are
    277         loader_name = f'{mode}_dataloader'
--> 278         dataloaders = self.request_dataloader(getattr(model, loader_name))
    279 
    280         if not isinstance(dataloaders, list):

/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py in request_dataloader(self, dataloader_fx)
    396             The dataloader
    397         """
--> 398         dataloader = dataloader_fx()
    399         dataloader = self._flatten_dl_only(dataloader)
    400 

<ipython-input-3-e66142f2568e> in val_dataloader(self)
     56             data_path="../input/kinetics400partial/valid",
     57             clip_sampler=pytorchvideo.data.make_clip_sampler("uniform", 2),
---> 58             transform=self.transform,
     59         )
     60         return torch.utils.data.DataLoader(

/opt/conda/lib/python3.7/site-packages/pytorchvideo/data/encoded_video_dataset.py in labeled_encoded_video_dataset(data_path, clip_sampler, video_sampler, transform, video_path_prefix, decode_audio, decoder)
    266     # with PyTorch DataLoader workers. To avoid this, we make sure the PathManager
    267     # calls (made by LabeledVideoPaths) are wrapped in their own sandboxed process.
--> 268     labeled_video_paths = LabeledVideoPaths.from_path(data_path)
    269 
    270     labeled_video_paths.path_prefix = video_path_prefix

/opt/conda/lib/python3.7/site-packages/pytorchvideo/data/labeled_video_paths.py in from_path(cls, data_path)
     30             return LabeledVideoPaths.from_csv(data_path)
     31         elif g_pathmgr.isdir(data_path):
---> 32             return LabeledVideoPaths.from_directory(data_path)
     33         else:
     34             raise FileNotFoundError(f"{data_path} not found.")

/opt/conda/lib/python3.7/site-packages/pytorchvideo/data/labeled_video_paths.py in from_directory(cls, dir_path)
    102         assert (
    103             len(video_paths_and_label) > 0
--> 104         ), f"Failed to load dataset from {dir_path}."
    105         return cls(video_paths_and_label)
    106 

AssertionError: Failed to load dataset from ../input/kinetics400partial/valid.

How to save .pt from ckpt?

Hello I trained model with custom dataset for video classification.
and I get a weights file(.ckpt). but I want to use it android app(https://github.com/pytorch/android-demo-app/tree/master/TorchVideo).

How to save .pt from ckpt?
and I want to test(or demo) a video clip using my model. your test code is not working. help me

Webcam implementation

I was wondering if there was a way to do action recognition with pytorchvideo using a live webcam

Problem with pretrained CSN

Hi everyone!

I have a problem arising only when I exploit a pretrained network, no matter what the network is.

In particular I am getting the following error:

Traceback (most recent call last):
File "C:/Users/Microlab/Desktop/Marco/videoClassification/Train.py", line 199, in
train()
File "C:/Users/Microlab/Desktop/Marco/videoClassification/Train.py", line 196, in train
trainer.fit(classification_module)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit
self._run(model)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run
self.dispatch()
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage
return self.run_train()
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train
self.train_loop.run_training_epoch()
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 566, in run_training_epoch
self.on_train_epoch_end(epoch_output)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 606, in on_train_epoch_end
training_epoch_end_output = model.training_epoch_end(processed_epoch_output)
File "C:/Users/Microlab/Desktop/Marco/videoClassification/Train.py", line 133, in training_epoch_end
self.logger.experiment.add_graph(LightVideoClassification(), sampleImg)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\utils\tensorboard\writer.py", line 723, in add_graph
self._get_file_writer().add_graph(graph(model, input_to_model, verbose))
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\utils\tensorboard_pytorch_graph.py", line 292, in graph
raise e
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\utils\tensorboard_pytorch_graph.py", line 286, in graph
trace = torch.jit.trace(model, args)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\jit_trace.py", line 742, in trace
_module_class,
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\jit_trace.py", line 940, in trace_module
_force_outplace,
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:/Users/Microlab/Desktop/Marco/videoClassification/Train.py", line 70, in forward
return self.model(x)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorchvideo\models\net.py", line 43, in forward
x = self.blocksidx
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\pytorchvideo\models\stem.py", line 253, in forward
x = self.conv(x)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "C:\Users\Microlab\miniconda3\envs\simone\lib\site-packages\torch\nn\modules\conv.py", line 521, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 3, 3, 7, 7], but got 4-dimensional input of size [3, 60, 224, 224] instead
Epoch 1: 75%|███████▌ | 174/232 [03:34<01:11, 1.23s/it, loss=0.703, v_num=15]

As you can see, the first epoch is perfectly finished as well as 75% of the second one.

Can anyone help me?

How to use this library to develop my own video classification model

I want to use the default dataset code and change the model file, Are there any instructions about training and inference code to reproduce the existing model results and develop my own model, thanks!

Kinetics' frame dataset

I am using my own data set for training. The previous data is similar to Kinetics' mp4 storage method. Now I want to convert mp4 to video frames, and each folder contains jpg format video frames. How can I modify the code?

AssertionError: input for MultiPathWayWithFuse needs to be a list of tensors

Using 'slowfast_r50', the error is:
AssertionError: input for MultiPathWayWithFuse needs to be a list of tensors

fvcore required version

If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:

🐛 Bugs / Unexpected behaviors

detectron2 and pytorchvideo have contradicting requirements for fvcore

Instructions To Reproduce the Issue:

I have Ubuntu 20.04, pytorch v.1.8.1+cu102, Cuda compilation tools, release 10.1, V10.1.243
So I tried to install detectron2 (https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md)
python -m pip install detectron2 -f
https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
And have this one error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytorchvideo 0.1.1 requires fvcore>=0.1.4, but you have fvcore 0.1.3.post20210317 which is incompatible.
So I update fvcore: pip install -U fvcore
And get another error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
detectron2 0.4+cu102 requires fvcore<0.1.4,>=0.1.3, but you have fvcore 0.1.5.post20210609 which is incompatible.

Real Time Video?

I would like to use this on real-time video rather than video files. Possible?

RuntimeError: stack expects each tensor to be equal size, but got [3, 61, 864, 1152] at entry 0 and [3, 60, 864, 1152] at entry 1

Hi everyone! I am getting the aforementioned error when using a custom dataset, does anyone know why?
I assumed it was due to videos having different framerate ( so the Time - T - dimension would be different from one clip sample to the other) therefore I preprocessed them to have the same frame rate. However, it still does not work.

Thanks.

Train on slowfast model ...

Current tutorial only support train on slow_r50 model, but not slowfast_r50 model.

I want pytorch-video to support train on slowfast_r50 model by modifying the data-loader in tutorial-classification-example.

Does the X3D model localize the region where the action is recognized? If so, how do I visualize these results?

Hi! I have tried using the recommended action recognition models on pre-existing videos. I would like to visualize the results as shown in this video - https://www.youtube.com/watch?v=b7-gnpqz9Qg&ab_channel=FacebookAI

How do i load part of model weight that pulbic in torch.hub to train my video classification model?

I want to train my own video classification model, but I don't know how to load part of the pre-trained model. All I find is create_resnet and creat_slowfast ?

TypeError: 'int' object is not callable

File "train.py", line 349, in len
return self.dataset.num_videos()
TypeError: 'int' object is not callable

Is there anyway to help me speed up in loading the video data?

I'm writing a dataloaer file with pytorchvideo currently, all the thing is working quite fine except only the speed. It takes me almost 1 second to load a video at one time. Some of my codes are shown below.

transform = ApplyTransformToKey(
    key="video",
    transform=Compose([
        UniformTemporalSubsample(64),
        Lambda(lambda x: x / 255.0),
        NormalizeVideo(mean, std),
        ShortSideScale(size=256),
        CenterCropVideo(256),
    ]),
)

My transform, in simple, getting 64 frames in 64 second-length video clip.

def __getitem__(self, index):
    video_data = self.video.get_clip(start_sec=index * self.length, end_sec=(index + 1) * self.length)
    video_data = self.transform(video_data)
    imgs_data, audios_data = video_data['video'], video_data['audio']
    labels = torch.from_numpy(self.va_read(int(index + 1))).t()
    return imgs_data, audios_data, labels, self.movie_name, index

part of the consumming time of getting the data. The n
Epoch: [0][4/24] Time_train(average) 30.307 (30.044) Data_load(average) 30.035 (29.764)
Epoch: [0][5/24] Time_train(average) 58.678 (31.728) Data_load(average) 58.383 (31.448)
Epoch: [0][6/24] Time_train(average) 49.739 (35.320) Data_load(average) 49.315 (35.036)
Epoch: [0][6/24] Time_train(average) 53.579 (38.151) Data_load(average) 53.336 (37.872)
Epoch: [0][7/24] Time_train(average) 51.027 (38.822) Data_load(average) 50.760 (38.543)

So is there anything I can do for speeding up the loading video? One more thing is that given the fixth length of video clip, the length of audios seems is not the same which will cause an error if the batch size is more than 1, how can i set the sampling rate of audios? I will be thankful for all kinds of help or suggestions.

Do I need a Softmax at the end of a pretrained network?

When I load a pretrained network and I specify the number of classes, I correctly receive a tensor having 4 dimensions. However, I then compare this with the labels as suggested in the tutorial. Do I need something that retrieves the index of the most probable class or it is handled automatically by lightning?

Thanks

Input error

While I was following the "Running a pre-trained PyTorchVideo classification model using Torch Hub" tutorial I am getting this error when I call preds = model(inputs). I only get this error when I use "slowfast_r50" as the model name. When I change the model name to slow_r50 it works fine.

RuntimeError Traceback (most recent call last)
in
----> 1 preds = model(inputs)

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),

~\pytorchvideo\pytorchvideo\models\net.py in forward(self, x)
41 def forward(self, x: torch.Tensor) -> torch.Tensor:
42 for idx in range(len(self.blocks)):
---> 43 x = self.blocksidx
44 return x
45

~\pytorchvideo\pytorchvideo\models\net.py in forward(self, x)
85 for pathway_idx in range(len(self.multipathway_blocks)):
86 if self.multipathway_blocks[pathway_idx] is not None:
---> 87 x_out[pathway_idx] = self.multipathway_blocks[pathway_idx](
88 x[pathway_idx]
89 )

~\pytorchvideo\pytorchvideo\models\stem.py in forward(self, x)
251
252 def forward(self, x: torch.Tensor) -> torch.Tensor:
--> 253 x = self.conv(x)
254 if self.norm is not None:
255 x = self.norm(x)

~\anaconda3\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
518 self.weight, self.bias, self.stride, _triple(0),
519 self.dilation, self.groups)
--> 520 return F.conv3d(input, self.weight, self.bias, self.stride,
521 self.padding, self.dilation, self.groups)
522

RuntimeError: Expected 5--dimensional input for 5-dimensional weight [64, 3, 1, 7, 7], but got 4-dimensional input of size [1, 8, 256, 256] instead

Feature Extraction

🚀 Feature

It would be great if the models API had an easy way to instantiate a model with the purpose of feature extraction!

Meanwhile, is there any workaround?

Torch Hub Model Names

❓ Questions on how to use PyTorchVideo

I am trying to load every model from the model zoo one-by-one. However, there is no list of model names to use in the torch.hub.load() function. For example, I have from the tutorial 'slow_r50' is the model corresponding to the Slow R50 8x8 model, but I can't find how to load the Slow R50 4x16 model. More specifically, what is the string I use in the torch.hub.load() function? Another example is the I3D model. I have tried a number of different strings trying to load a pre-trained I3D model from torch hub but non work. I see in the hubconf.py a few different models, but not all of them.

Could you supply a list of strings for the torch.hub.load() function and their corresponding model zoo names?

Thanks!

Loading and inference with a Charades pre-trained model

I looking into how to load a pre-trained model (on other dataset except Kinetics). In particular, I want to load slowfast, pre-trained on Charades. For that, I tried the following.

from pytorchvideo.models.hub.slowfast import _slowfast
import torch

root_url = "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo"
checkpoint_path = f"{root_url}/charades/SLOWFAST_8x8_R50.pyth"
slowfast = _slowfast(pretrained=True, checkpoint_path=checkpoint_path)

with torch.no_grad():
    output = slowfast([torch.rand(1, 3, 8, 224, 224), torch.rand(1, 3, 32, 224, 224)])
    # torch.Size([1, 400])
    print(output.size())

My questions are:

Since the model loading succeeds, why the model output still corresponds to the Kinetics 400 number of classes? I did expect an error, in the case the model architecture and the loaded model don't match?
I didn't find a concrete tutorial how to load the models?
Once the model is loaded, what is the right way to preprocess the video, so that it corresponds to the exact way of training, on the corresponding dataset?

ColorJitter is not compatible with ApplyTransformToKey(key="video")

As described in some of your tutorials, I typically use torchvision transformations which I compose and then feed into ApplyTransformToKey. I have tried many, and they all seem to work.

However, it seems ColorJitter (from torchvision.transforms) is the first that does not work. It confuses the time dimension with the channel dimension in the tensor, and therefore says "TypeError: Input image tensor permitted channel values are [3], but found 30".

I was wondering, is this intended behavior? If so, is there any way I can make ColorJitter work for video with pytorchvideo?

Thanks a lot in advance!

RAM Crash on Google Colab

I am trying to run the demo under pytorchvideo/tutorials/video_detection_example/video_detection_inference_tutorial.ipynb on Google Colab however when I try load the video using

encoded_vid = pytorchvideo.data.encoded_video.EncodedVideo.from_path('theatre.webm')
print('Completed loading encoded video.')

Google colab crashes saying there is not enough RAM. Do I need to get Google Colab Pro to be able to run this Demo?

No such operator video_reader::read_video_from_memory when using torchvision backend

🐛 Bugs / Unexpected behaviors

I get a RuntimeError: No such operator video_reader::read_video_from_memory when trying to decode a video using the torchvision backend.

Instructions To Reproduce the Issue:

Run the following code:

from pytorchvideo.data.encoded_video import EncodedVideo

path_video = "/path/to/video.mp4"
video_pyav = EncodedVideo.from_path(path_video, decoder='pyav')  # Runs without any problem
video_torchvision = EncodedVideo.from_path(path_video, decoder='torchvision')  # Throws error

Logs:

Failed to decode video of name <video_name>.mp4. No such operator video_reader::read_video_from_memory
Traceback (most recent call last):
  File "/path/to/conda/env/lib/python3.7/site-packages/pytorchvideo/data/encoded_video_torchvision.py", line 206, in _torch_vision_decode_video
    raise e
  File "/path/to/conda/env/lib/python3.7/site-packages/pytorchvideo/data/encoded_video_torchvision.py", line 180, in _torch_vision_decode_video
    tv_result = torch.ops.video_reader.read_video_from_memory(
  File "/path/to/conda/env/lib/python3.7/site-packages/torch/_ops.py", line 60, in __getattr__
    op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator video_reader::read_video_from_memory
python-BaseException

The current version of the libraries is:
pytorchvideo -> 0.1.2
torch -> 1.9.0+cu111
torchvision -> 0.10.0+cu111

Thanks!

Little mistake - Get Started Website Description

Link: https://pytorchvideo.org/

My import libraries

import torch
from torchvision.transforms import Compose, Lambda
from torchvision.transforms._transforms_video import (
    CenterCropVideo,
    NormalizeVideo,
)
from pytorchvideo.data.encoded_video import EncodedVideo
from pytorchvideo.transforms import (
    ApplyTransformToKey,
    ShortSideScale,
    UniformTemporalSubsample,
    UniformCropVideo
)

Current version on website:

# Generate top 5 predictions
post_act = F.softmax(dim=1)
preds = post_act(preds)
pred_class_ids = preds.topk(k=5).indices

Correct version:

# Generate top 5 predictions
post_act =torch.nn.Softmax(dim=1)
preds = post_act(preds)
pred_class_ids = preds.topk(k=5).indices
print(pred_class_ids)

#Mapping predicted classes

pred_class_names = [kinetics_id_to_classname[int(i)] for i in pred_classes[0]]
print("Predicted labels: %s" % ", ".join(pred_class_names))

Output example

[1] tensor([[192, 104, 315, 268,  78]], device='cuda:0')
[2] Predicted labels: marching, driving tractor, sled dog racing, riding camel, crossing river

Documentation

https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

Implementation of "nested" or "ragged" tensors

I want to work with video data with clips of different lengt, i.e. different number of frames. As of now, in the default_collate(batch) in .../torch/utils/data/_utils/collate.py the elements of the batch are transformed into a tensor using torch.stack(batch, 0, out=out). Is there any plans for introducing nestedtensors https://github.com/pytorch/nestedtensor in the near future to be able to work video clips of varying length?

Thanks in advance and thank you for the library, it is a pleasure to work with.

Loading videos at a desired frame rate (fps)

🚀 Feature

Enable the EncodedVideoDataset class to accept an fps argument which will determine the fps in which the videos are read

Motivation

Several action recognition method make use of lowering the input fps as a way of reducing computational load. (e.g. https://arxiv.org/abs/2103.13915) .
Currently there is no way of using the Kinetics dataset with a required fps. It would also be useful to specify the desired number of frames per clip, and get a clip with uniformly sampled frames from the whole video.

LabeledVideoDataset should return clip_start and clip_end

🚀 Feature

It would be great if LabeledVideoDataset could return clip_start and clip_end, which are return from the sampler.

Motivation

This is important if you are loading video and audio separately, since you need to know the clip_start and clip_end from the sampler to use the same for the audio, which was loaded separately but must be synced.

Pitch

Literally adding clip_start and clip_end to the return dict of LabeledVideoDataset. I have done this locally and it works fine. I would do a pull request but it seems like such a small detail that it's not worth it. But if you would like me to do that let me know.

C2D with torch hub not possible?

❓ Questions on how to use PyTorchVideo

Hello,

Thank you for releasing this nice framework for video domain.

I was wondering if it is possible to use the C2D model with this library ? In fact, it is mention in the benchmarking of pytorchvideo, but it seems not possible to use it with the torch.hub method, as it is not mentionned in the hubconf.py
Is there another way to use the C2D model ?

AttributeError: Can't pickle local object 'create_x3d_res_block.<locals>.<lambda>'

If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:

🐛 Bugs / Unexpected behaviors

The hub contains model defined with lambda function. Those should be avoided as they can't be pickled.

NOTE: Please look at the existing list of Issues tagged with the label 'bug`. Only open a new issue if this bug has not already been reported. If an issue already exists, please comment there instead..

Instructions To Reproduce the Issue:

Please include the following (depending on what the issue is):

Any changes you made (git diff) or code you wrote

<put diff or code here>

The exact command(s) you ran:
What you observed (including the full logs):

<put logs here>

Please also simplify the steps as much as possible so they do not require additional resources to
run, such as a private dataset, models, etc.

Load Channel Separated Network

Hi everyone!

Can someone help me understand how do I load a CSN from torch hub? is it possible?

Thanks!

Video and audio sample lengths

I'm trying to sample video clips and audio samples stored in mp4 format by creating a dataset as follows:

sampler = RandomSampler
num_frames = 25
fps = 25
dataset = labeled_encoded_video_dataset(
    data_path=os.path.join(root, "val.csv"),
    clip_sampler=make_clip_sampler("uniform", num_frames / fps),
    video_path_prefix=root,
    transform=val_transform,
    video_sampler=sampler,
)

Given that the duration passed to make_clip_sampler is 1 second (num_frames / fps), I expected 25 video frames and 16,000 audio samples to be returned (sampling rate = 16,000). But instead, each sample consists of 26 video frames and 15,360 audio samples. I was digging into the code in encoded_video_pyav.py, and I'm wondering whether an off-by-one error is being made in the below code by using a "<=" operator instead of a "<"?

  video_frames = [
      f
      for f, pts in self._video
      if pts >= video_start_pts and pts **<=** video_end_pts
  ]

Also, it seems to me that audio frames, each of length 1024, are concatenated, always producing audio tensors with lengths that are multiples of 1024 (15,360 in this example), instead of lengths corresponding to the duration specified (16,000 in this example).

Is this behaviour expected? How can I get video and audio lengths that correspond to the duration passed to make_clip_sampler above?

Many thanks!

ImportError: cannot import name 'Normalize' from 'pytorchvideo.transforms'

ImportError: cannot import name 'Normalize' from 'pytorchvideo.transforms.transforms'

How to get multi-clips from a video?

Currently, it seems that we can only get a single clip from a video at each iteration.

efficient_x3d_mobile_cpu

is this going to be supported on pytorch hub

ImportError: cannot import name 'slow_r50_detection' from 'pytorchvideo.models.hub

🐛 Bugs / Unexpected behaviors

when using

model_name = "slowfast_r50"
SLOWFAST50_MODEL = torch.hub.load("facebookresearch/pytorchvideo", model=model_name, pretrained=True)

I came across bug follows

/root/.cache/torch/hub/facebookresearch_pytorchvideo_master/hubconf.py in <module>()
      2 
      3 dependencies = ["torch"]
----> 4 from pytorchvideo.models.hub import (  # noqa: F401, E402
      5     efficient_x3d_s,
      6     efficient_x3d_xs,
ImportError: cannot import name 'slow_r50_detection' from 'pytorchvideo.models.hub' (/usr/local/lib/python3.7/dist-packages/pytorchvideo/models/hub/__init__.py)

Instructions To Reproduce the Issue:

My code came from https://pytorchvideo.org/docs/tutorial_torchhub_inference.
My environment is google colab.

It Seems is a bug from official respository. Thanks

the bug when use self.save_hyperparameters()

I followed the tutorial to train my own video classification model, but this bug appeared when I wanted to use the saved model for inference。

Traceback (most recent call last): File "eval.py", line 181, in <module> main() File "eval.py", line 122, in main model = MyLightingModule.load_from_checkpoint(checkpoint_path) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 199, in _load_model_state model = cls(**_cls_kwargs) TypeError: __init__() missing 1 required positional argument: 'args'

Then they checked the information on the Internet and they said they could add this code.
Lightning-AI/pytorch-lightning#2909

However, I added this line of code to retrain and there was a bug. Can someone tell me why?

raceback (most recent call last): File "train_frame.py", line 630, in <module> main() File "train_frame.py", line 610, in main train(args) File "train_frame.py", line 617, in train trainer.fit(classification_module, data_module) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit self._run(model) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run self.dispatch() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch self.accelerator.start_training(self) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training self.training_type_plugin.start_training(trainer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training self._results = trainer.run_stage() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage return self.run_train() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train self.train_loop.run_training_epoch() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 584, in run_training_epoch self.trainer.run_evaluation(on_epoch=True) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1006, in run_evaluation self.evaluation_loop.on_evaluation_end() File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 102, in on_evaluation_end self.trainer.call_hook('on_validation_end', *args, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1223, in call_hook trainer_hook(*args, **kwargs) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 227, in on_validation_end callback.on_validation_end(self, self.lightning_module) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 249, in on_validation_end self.save_checkpoint(trainer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 298, in save_checkpoint self._save_top_k_checkpoint(trainer, monitor_candidates) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 669, in _save_top_k_checkpoint self._update_best_and_save(current, trainer, monitor_candidates) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 730, in _update_best_and_save self._save_model(trainer, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 449, in _save_model self._do_save(trainer, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 460, in _do_save trainer.save_checkpoint(filepath, self.save_weights_only) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/properties.py", line 330, in save_checkpoint self.checkpoint_connector.save_checkpoint(filepath, weights_only) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 392, in save_checkpoint self.trainer.accelerator.save_checkpoint(_checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 516, in save_checkpoint self.training_type_plugin.save_checkpoint(checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 256, in save_checkpoint atomic_save(checkpoint, filepath) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/cloud_io.py", line 64, in atomic_save torch.save(checkpoint, bytesbuffer) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 379, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/torch/serialization.py", line 484, in _save pickler.dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x7ff195ab3b80>: attribute lookup <lambda> on pytorchvideo.models.resnet failed Exception ignored in: <function tqdm.__del__ at 0x7ff1b151cdc0> Traceback (most recent call last): File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1145, in __del__ File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1299, in close File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1492, in display File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1148, in __str__ File "/data1/thorqian/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1450, in format_dict TypeError: cannot unpack non-iterable NoneType object

[help want] change to custom pt model in android demo app, it's failed.

EncodedVideoDataset OOM

Hi,

I am trying to use EncodedVideoDataset with Voxceleb2, which features >1 million 3-20 second videos . Currently, this is making me run out of RAM, and my process is killed (I checked dmesg -T, to make sure, and this is indeed what is happening). I have 32 gb of RAM on my machine, which should not be that low, I suppose, but I guess this is quite a lot of data as well.

Is there any workaround for this, or am I absolutely forced to move to a machine with more RAM?

Thanks a lot in advance.

Can pytorchvideo support Python3.6?

Hi, can pytorchvideo support python3.6? I'm running my program in a docker where I cannot change its python version and I hope there is a way to circumvent this minor version difference.

Should add `Lambda(lambda x: x/255.0)` in tutorial

Should add Lambda(lambda x: x/255.0) in tutorial, right ?

Fine-tuning pre-trained models on AVA dataset?

Hi!

I would like to fine-tune pre-trained model using AVA dataset format. How to achieve this using pytorchvideo?
Current tutorial shows on how to run inference on already fine-tuned models.

Thank you!

pytorchvideo.data.labeled_video_dataset: Failed to load video with error: video/_104.avi not found.; trial 0

Hi everyone,
I am trying to use a custom dataset with pytorchvideo following the tutorial for training, however, I am getting the following error: pytorchvideo.data.labeled_video_dataset: Failed to load video with error: video/_104.avi not found.; trial 0
I have created the csv files formatted as written in the data preparation tutorial as "path_to_video label".
the datapath directory I specified is the one containing train.csv and val.csv and also in the same directory there is a "video" folder where the videos are, as suggested from the error the path says "video/_104.avi".

How can I fix?

Thanks!

facebookresearch / pytorchvideo Goto Github PK

pytorchvideo's People

Contributors

Stargazers

Watchers

Forkers

pytorchvideo's Issues

🚀 Feature

Motivation

Pitch

🚀 Feature

Motivation

Pitch

🚀 Feature

Motivation

Pitch

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue:

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue:

🚀 Feature

❓ Questions on how to use PyTorchVideo

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue:

Output example

Documentation

🚀 Feature

Motivation

🚀 Feature

Motivation

Pitch

❓ Questions on how to use PyTorchVideo

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue:

🐛 Bugs / Unexpected behaviors

Instructions To Reproduce the Issue:

Recommend Projects

Recommend Topics

Recommend Org