commaai / commavq Goto Github PK

commaVQ is a dataset of compressed driving video

License: MIT License

Jupyter Notebook 98.96% Python 1.02% Shell 0.02%

commavq's Introduction

Source Video	Compressed Video	Future Prediction
source_video.mp4	compressed_video.mp4	generated.mp4

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains:

encoder/decoder models used to heavily compress driving scenes
a world model trained on 3,000,000 minutes of driving videos
a dataset of 100,000 minutes of compressed driving videos

Task

Lossless compression challenge: make me smaller! $500 challenge

Losslessly compress 5,000 minutes of driving video "tokens". Go to ./compression/ to start

Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE

Submit a single zip file containing the compressed data and a python script to decompress it into its original form. Top solutions are listed on comma's official leaderboard.

Implementation	Compression rate
pkourouklidis (arithmetic coding with GPT)	2.6
anonymous (zpaq)	2.3
rostislav (zpaq)	2.3
anonymous (zpaq)	2.2
anonymous (zpaq)	2.2
0x41head (zpaq)	2.2
tillinf (zpaq)	2.2
baseline (lzma)	1.6

Overview

A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

Examples

./notebooks/encode.ipynb and ./notebooks/decode.ipynb for an example of how to visualize the dataset using a segment of driving video from comma's drive to Taco Bell

./notebooks/gpt.ipynb for an example of how to use the world model to imagine future frames.

./compression/compress.py for an example of how to compress the tokens using lzma

Download the dataset

Using huggingface datasets

import numpy as np
from datasets import load_dataset
num_proc = 40 # CPUs go brrrr
ds = load_dataset('commaai/commavq', num_proc=num_proc)
tokens = np.load(ds['0'][0]['path']) # first segment from the first data shard

Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

References

[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

commavq's People

Contributors

Stargazers

Watchers

Forkers

spencerhhubert evdcush hbcbh1999 davidfm43 quantumtau tixxdz rathawut apollohuang1 hcnguyen111 ubugnu techthiyanes vishnusaisankeerth iejmac paulgwamanda seanavery jacksonkaunismaa tobias17 gijskoning plaskod jadgardner elithecoder oxygenliu nitishsrivastava yuanzhongqiao aldler springer-electronics aaravpandya quanta-naut freek1 lz9168 grekiki2 nancyjlau 0x7b5 armandpl peter2520 pau-mensa jacob-zietek shivenmian cristicretu swordonfire smjd11 rishav-dev khankindle istocko

commavq's Issues

Inference challenge

Is this still active? I made some progress (30-50% speed improvement with the same model), but it is still pretty far from 2x, so let me know if it is still relevant.

"No such file or directory" when using HuggingFace dataset downloader

On running the code sample from the README, or compression/compress.py, Python throws an FileNotFoundError: [Errno 2] No such file or directory: '3b41c0fa8959aea6c118e5714f412a2e_13.npy error (the files are not downloaded to the expected location). I did check ~/.cache/huggingface/datasets/commaai___commavq and found some .arrow files, but the names do not match up with those specified by the dataset. Some help troubleshooting this would be much appreciated. Thanks!

train details of vqvae

can you provide more train details about vqvae. how many frames do you use to train

InvalidProtobuf

Error description:
InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /home/mojo/dev/commavq-challenge/commavq/models/decoder.onnx failed:Protobuf parsing failed.

This happens when I try to load the decoder model.
When I try the checker

onnx.checker.check_model('../models/decoder.onnx')

I get the following error:
ValidationError: Unable to parse proto from file: /home/mojo/dev/commavq-challenge/commavq/models/decoder.onnx. Please check if it is a valid protobuf file of proto.

I am not sure if the onnx file is broken or this is an onnx problem?

I tried onnxruntime-gpu and onnx 1.14 and 1.15, both times the same problem.

Anyone else experiencing the same problems?

Is maybe this the problem?
https://github.com/onnx/onnx/blob/09a4e65bb098164491b021ffe563a559fbc1a808/docs/ExternalData.md

i have a bug for gpt.iynb.Can you help me?

File "notebooks/gpt.ipynb" y = model.generate(idx,config.tokens_per_frame)
RuntimeError : Index put requires the source and destination dtypes match ,got Half for the destination and BFloat16 for the source.
I don't know what the problem is, I guess it might be an environmental issue。May I ask what is the runtime environment for your code？I found that I couldn't log in to gpt2m @ 12f0a5e. What is this?

How is the pose data used in conditioning during inference?

The pose data are given as 6 real valued numbers, per frame. How are these values used in conditioning the model?

The talk mentions the use of np.digitize to tokenize the poses, which would allow for the subsequent pass through the embedding layer. But what are the bin values?

And are the pose values prepended before the BOS token while inferencing for every frame we need to condition? Then what does this mean about the maximum context length of the model? How would we condition it again after doing it once in the starting?

Can't Download Pretrained Models

Command run:

git lfs fetch --all

Result:

fetch: 9 object(s) found, done.
fetch: Fetching all references...
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/commaai/commavq.git/info/lfs'

Maybe put them on academic torrents?

Redundant assignment

https://github.com/commaai/commavq/blob/314d12d411a0b2dbdc29bd97484f629ee541e001/nanogpt/prepare.py#L24C1-L25C3

Request for 6-DOF Pose Information for Each Frame

I kindly request the project's author to provide 6-DOF pose information for each frame. The current implementation is useful, but I also need this additional data.

Please consider adding an option or API that allows accessing the 6-DOF pose information for every frame. This would greatly enhance the project's capabilities and benefit users with similar needs.

Thank you for considering this feature request.

Access to training code, provide incentives

This approach seems very interesting for other use cases.
For example, for my use case I would like to VQ videos of sign language, not of driving.
If the training code would be available (including fast video decoding, data augmentation, and training strategy), one could replicate your models on a different domain.

Why would you care?
If I can replicate this on my domain, I have high incentives to make this repo more performant.

[ONNXRuntimeError] : 7 : INVALID_PROTOBUF : while loading encoder.onnx

The protobuf version

(base) a@t4:~$ protoc --version
libprotoc 3.12.4

ONNX runtime

In [1]: import onnxruntime as ort

In [2]: ort.__version__
Out[2]: '1.16.0'

Python version

(base) a@t4:~$ python --version
Python 3.10.12

Traceback

In [6]: options = ort.SessionOptions()
   ...: provider = 'CUDAExecutionProvider'
   ...: session = ort.InferenceSession('/home/a/commavq/gpt2m/encoder.onnx', options, [provider])
/opt/conda/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(
---------------------------------------------------------------------------
InvalidProtobuf                           Traceback (most recent call last)
Cell In[6], line 3
      1 options = ort.SessionOptions()
      2 provider = 'CUDAExecutionProvider'
----> 3 session = ort.InferenceSession('/home/a/commavq/gpt2m/encoder.onnx', options, [provider])

File /opt/conda/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:419, in InferenceSession.__init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
    416 disabled_optimizers = kwargs["disabled_optimizers"] if "disabled_optimizers" in kwargs else None
    418 try:
--> 419     self._create_inference_session(providers, provider_options, disabled_optimizers)
    420 except (ValueError, RuntimeError) as e:
    421     if self._enable_fallback:

File /opt/conda/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:460, in InferenceSession._create_inference_session(self, providers, provider_options, disabled_optimizers)
    458 session_options = self._sess_options if self._sess_options else C.get_default_session_options()
    459 if self._model_path:
--> 460     sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    461 else:
    462     sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)

InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /home/a/commavq/gpt2m/encoder.onnx failed:Protobuf parsing failed.

eval.ipynb

"model" is not defined

on line 22
# your model here!
pred = model(x)

Inference Bounty

Hey @YassineYousfi!

Just learnt about this from your talk being (re)released, very cool!

Spent some time on the inference bounty, getting 0.2s per frame on 4090 and 0.28s on a 3090. May be more I can squeeze out, but not certain.

Some quick questions:

Is the bounty still open?
Does a pure PyTorch solution with some startup time due to compilation fit your criteria?
If yes to the above two, what's the standard process re: bounties?