Giter Site home page Giter Site logo

ificl / slfm Goto Github PK

View Code? Open in Web Editor NEW
34.0 2.0 8.0 64.49 MB

Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Home Page: https://ificl.github.io/SLfM/

License: MIT License

Python 95.32% Shell 4.68%

slfm's Introduction

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

Ziyang Chen, Shengyi Qian, Andrew Owens
University of Michigan, Ann Arbor
ICCV 2023


This repository contains the official codebase for Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation. [Project Page]

SLfM Illustration

Environment

To setup the environment, please simply run

conda env create -f environment.yml
conda activate SLfM

Datasets

LibriSpeech

We use speech samples from this dataset to render binaural audio. Data can be downloaded from here. Please see Dataset/LibriSpeech for more processing details.

Free Music Archive (FMA)

We use audio samples from this dataset to render binaural audio. Data can be downloaded from FMA offical github repo. Please see Dataset/Free-Music-Archive for more processing details.

HM3D-SS

we use the SoundSpaces 2.0 platform and Habitat-Matterport 3D dataset to create our audio-visual dataset HM3D-SS. Please follow the installation guide from here.

We provide the code under (Dataset/AI-Habitat) for generating the dataset. To create HM3D-SS dataset, simply run:

cd Dataset/AI-Habitat
# please check out the bash files before running, which require users to sepecify the output directory
sh ./multi-preprocess.sh
sh ./multi-postprocess.sh

Demo Videos

We also provide self-recorded real-world videos under Dataset/DemoVideos/RawVideos. The videos are recorded using iPhone 14 Pro and binaural audio are recorded with Sennheiser AMBEO Smart Headset. The demo videos are for research purposes only.

Pretrained Models

We will release several models pre-trained with our proposed methods. We hope it could benefit our research communities. To download all the checkpoints, simply run

cd slfm
sh ./scripts/download_models.sh

Train & Evaluation

We provide training and evaluation scripts under scripts, please check each bash file before running.

  • To train and evaluate our SLfM cross-view binauralization pretext task and perform linear probing experiments, simply run:
cd slfm
sh ./scripts/training/slfm-pretext.sh
  • To train and evaluate our SLfM model with freezed embedding from the pretext task, simply run:
cd slfm
sh ./scripts/training/slfm-geometric.sh

Citation

If you find this code useful, please consider citing:

@inproceedings{
    chen2023sound,
    title={Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation},
    author={Chen, Ziyang and Qian, Shengyi and Owens, Andrew},
    booktitle = {ICCV},
    year={2023}
}

Acknowledgment

This work was funded in part by DARPA Semafor and Sony. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.