Giter Site home page Giter Site logo

vpt360's Introduction

Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need

Abstract

Virtual Reality (VR) multimedia technology has dramatically advanced in recent years. Its immersive and interactive natures enable users to view any direction in 360° content freely. Users do not see the entire 360° content at a glance, but only a portion in the viewport. Viewport-based adaptive streaming, which streams only the user’s viewport of interest with high quality, has emerged as the primary technique to save bandwidth over the best-effort Internet. Thus, users’ viewport prediction in the forthcoming seconds becomes an essential task for informing the streaming decisions in the VR system. Various viewport prediction methods based on deep neural networks have been proposed. However, typically they are composed of complex Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) that require heavy computation. To achieve high prediction accuracy in limited computation time in a streaming system, we propose a new transformer-based architecture, named 360° Viewport Prediction Transformer (VPT360), that only leverages the past viewport scanpath to predict a user’s future viewport scanpath. We evaluate VPT360 over three widely-used datasets and compare the computation complexity with the state-of-the-art methods. The experiments show that our VPT360 provides the highest accuracy for short-term and long-term prediction and achieves the lowest computation complexity. The code is publicly available at https://github.com/FannyChao/VPT360 to further contribute to the community.

To Be Continued...

Citing

@INPROCEEDINGS{VPT360,
  author={F. -Y. {Chao} and C. {Ozcinar} and A. {Smolic}},
  booktitle={IEEE 23nd International Workshop on Multimedia Signal Processing (MMSP)}, 
  title={Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need}, 
  year={2021},
  volume={},
  number={},
  pages={},
  doi={}}

vpt360's People

Contributors

fannychao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

vpt360's Issues

About code

I did not find the code in the GitHub repository.

Code for paper reproducibility

Hello, I've come across your paper 'Scanpath is all you need' and It linked me here for the project code. But I noticed there are only figures. Is there any way you could provide the code and dataset for reproduction of your implementation?

No code.

Hello. I didn't find the code section, can you upload it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.