Giter Site home page Giter Site logo

anton-jeran / fast-rir Goto Github PK

View Code? Open in Web Editor NEW
137.0 6.0 25.0 4.57 MB

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Home Page: https://anton-jeran.github.io/FRIR/

License: GNU Affero General Public License v3.0

Python 99.72% Shell 0.28%
room-impulse-response automatic-speech-recognition acoustics conditional-generation generative-adversarial-network speech augmentation synthetic-data rir neural-network

fast-rir's Introduction

FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR (ICASSP 2022)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given rectangular acoustic environment. Our model is inspired by StackGAN architecture. The audio examples and spectrograms of the generated RIRs are available here.

NEWS:

1) We have generalized our FAST-RIR to generate RIRs for any 3D indoor scenes represented using meshes. Official code of our network MESH2IR is available.

2) We release MULTI-CHANNEL MULTI-SPEAKER MULTI-SPATIAL AUDIO CODEC. The official code of our network M3-AUDIODEC is available.

Requirements

Python3.6
Pytorch
python-dateutil
easydict
pandas
torchfile
gdown
librosa
soundfile
acoustics
wavefile
wavfile
pyyaml==5.4.1
pickle

Embedding

Each normalized embedding is created as follows: If you are using our trained model, you may need to use extra parameter Correction(CRR).

Listener Position = LP
Source Position = SP
Room Dimension = RD
Reverberation Time = T60
Correction = CRR

CRR = 0.1 if 0.5<T60<0.6
CRR = 0.2 if T60>0.6
CRR = 0 otherwise

Embedding = ([LP_X,LP_Y,LP_Z,SP_X,SP_Y,SP_Z,RD_X,RD_Y,RD_Z,(T60+CRR)] /5) - 1

Generete RIRs using trained model

Download the trained model using this command

source download_generate.sh

Create normalized embeddings list in pickle format. You can run following command to generate an example embedding list

 python3 example1.py

Run the following command inside code_new to generate RIRs corresponding to the normalized embeddings list. You can find generated RIRs inside code_new/Generated_RIRs

python3 main.py --cfg cfg/RIR_eval.yml --gpu 0

Range

Our trained NN-DAS is capable of generating RIRs with the following range accurately.

Room Dimension X --> 8m to 11m
Room Dimesnion Y --> 6m to 8m
Room Dimension Z --> 2.5m to 3.5m
Listener Position --> Any position within the room
Speaker Position --> Any position within the room
Reverberation time --> 0.2s to 0.7s

Training the Model

Run the following command to download the training dataset we created using a Diffuse Acoustic Simulator. You also can train the model using your dataset.

source download_data.sh

Run the following command to train the model. You can pass what GPUs to be used for training as an input argument. In this example, I am using 2 GPUs.

python3 main.py --cfg cfg/RIR_s1.yml --gpu 0,1

Related Works

  1. IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition (INTERSPEECH2021)
  2. TS-RIR: Translated synthetic room impulse responses for speech augmentation (IEEE ASRU 2021)

Citations

If you use our FAST-RIR for your research, please consider citing

@INPROCEEDINGS{9747846, 
author={Ratnarajah, Anton and Zhang, Shi-Xiong and Yu, Meng and Tang, Zhenyu and Manocha, Dinesh and Yu, Dong}, 
booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Fast-Rir: Fast Neural Diffuse Room Impulse Response Generator},
year={2022}, 
volume={},
number={},
pages={571-575},
doi={10.1109/ICASSP43922.2022.9747846}}

Our work is inspired by

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

If you use our training dataset generated using Diffuse Acoustic Simulator in your research, please consider citing

@inproceedings{9052932,
  author={Z. {Tang} and L. {Chen} and B. {Wu} and D. {Yu} and D. {Manocha}},  
  booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},  
  title={Improving Reverberant Speech Training Using Diffuse Acoustic Simulation},   
  year={2020},  
  volume={},  
  number={},  
  pages={6969-6973},
}

fast-rir's People

Contributors

anton-jeran avatar jinzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fast-rir's Issues

how to generate multi channel RIRs for micphone array?

Is there any way to generate multi channel RIRs for micphone array, I try to generated 2 sources and 8 micphones mono RIR and stack them together with order of name, but get bad result, some of them look like normal, the rest look like interleave between 2 sources.

by the way, can this rep generate large room, such as 12m*12m?

DAS use question

Hi, how can I use DAS to generate RIR conforming to the given RT60?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.