Giter Site home page Giter Site logo

ishine / pseudobinaural_cvpr2021 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sheldontsui/pseudobinaural_cvpr2021

0.0 0.0 0.0 2.37 MB

Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)

License: Creative Commons Attribution 4.0 International

Python 95.21% Shell 4.79%

pseudobinaural_cvpr2021's Introduction

Visually Informed Binaural Audio Generation without Binaural Audios (CVPR 2021)

Xudong Xu*, Hang Zhou*, Ziwei Liu, Bo Dai, Xiaogang Wang, and Dahua Lin

Stereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating stereophonic audios guided by visual cues and multi-channel audio collections in a fully-supervised manner. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods to real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between the location of a sound source and the received binaural audio. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in the cross-dataset evaluation and comparable performance under subjective preference. Moreover, combined with binaural recorded data, our method is able to further boost the performance of binaural audio generation under supervised settings.

[Project] [Paper] [Demo]

Requirements

  • Python 3.7 is used. Basic requirements are listed in the 'requirements.txt'
pip install -r requirements.txt 

Dataset

FAIR-Play can be accessed here. MUSIC21 can be accessed here. YT-Music can be accessed here.

Training and Testing

All the training and testing bash scripts can be found in './scripts'. For FAIR-Play dataset, we create five non-overlapping splits in folder 'new_splits' as illustrated in the paper. Before training, please replace the contained items 'xxxxxx.mp3' into absolute path and ensure the 'audio_resave' folder and 'frames' folder locate in the same directory. Notice that, each item in 'data/mono_sources' is the audio file as well as the cropped object patch. For each video presented in 'data/mono_sources', I crop the object out and store the patches into 'new_patches' folder. The Faster-RCNN model from this repo has been adopted to do the cropping. The model trained on the non-overlapping split1 can be found here.

We have tried two different schemes for creating the pseudo visual-stereo pairs. One method is padding the visual patches on a pre-defined background image and leveraging the Possion blending to refine the boundary. Another is to place the visual patches on an empty background. We found the performance of empty background scheme is slightly better than the blending one.

License and Citation

The usage of this software is under CC-BY-4.0.

@inproceedings{xu2021visually,
  title={Visually Informed Binaural Audio Generation without Binaural Audios},
  author={Xu, Xudong and Zhou, Hang and Liu, Ziwei and Dai, Bo and Wang, Xiaogang and Lin, Dahua },
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},
  year={2021}
}

Acknowledgement

The structure of this codebase is borrowed from 2.5D Visual Sound and SepStereo.

pseudobinaural_cvpr2021's People

Contributors

sheldontsui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.