markhershey / audiodeepfakedetection Goto Github PK

View Code? Open in Web Editor NEW

60.0 3.0 17.0 201.23 MB

SUTD 50.039 Deep Learning Course Project (2022 Spring)

Home Page: https://markhh.com/AudioDeepFakeDetection/

License: MIT License

Python 59.16% HTML 17.56% TeX 23.28%

audio-deepfake-detection deepfake-detection audio deep-learning

audiodeepfakedetection's Introduction

Audio Deep Fake Detection

A Course Project for SUTD 50.039 Theory and Practice of Deep Learning (2022 Spring)

Created by Mark He Huang, Peiyuan Zhang, James Raphael Tiovalen, Madhumitha Balaji, and Shyam Sridhar.

Check out our: Project Report | Interactive Website

Setup Environment

# Set up Python virtual environment
python3 -m venv venv && source venv/bin/activate

# Make sure your PIP is up to date
pip install -U pip wheel setuptools

# Install required dependencies
pip install -r requirements.txt

Install PyTorch that suits your machine: https://pytorch.org/get-started/locally/

Setup Datasets

You may download the datasets used in the project from the following URLs:

(Real) Human Voice Dataset: LJ Speech (v1.1)
- This dataset consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.
(Fake) Synthetic Voice Dataset: WaveFake (v1.20)
- The dataset consists of 104,885 generated audio clips (16-bit PCM wav).

After downloading the datasets, you may extract them under data/real and data/fake respectively. In the end, the data directory should look like this:

data
├── real
│   └── wavs
└── fake
    ├── common_voices_prompts_from_conformer_fastspeech2_pwg_ljspeech
    ├── jsut_multi_band_melgan
    ├── jsut_parallel_wavegan
    ├── ljspeech_full_band_melgan
    ├── ljspeech_hifiGAN
    ├── ljspeech_melgan
    ├── ljspeech_melgan_large
    ├── ljspeech_multi_band_melgan
    ├── ljspeech_parallel_wavegan
    └── ljspeech_waveglow

Model Checkpoints

You may download the model checkpoints from here: Google Drive. Unzip the files and replace the saved directory with the extracted files.

Training

Use the train.py script to train the model.

usage: train.py [-h] [--real_dir REAL_DIR] [--fake_dir FAKE_DIR] [--batch_size BATCH_SIZE] [--epochs EPOCHS]
                [--seed SEED] [--feature_classname {wave,lfcc,mfcc}]
                [--model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}]
                [--in_distribution {True,False}] [--device DEVICE] [--deterministic] [--restore] [--eval_only] [--debug] [--debug_all]

optional arguments:
  -h, --help            show this help message and exit
  --real_dir REAL_DIR, --real REAL_DIR
                        Directory containing real data. (default: 'data/real')
  --fake_dir FAKE_DIR, --fake FAKE_DIR
                        Directory containing fake data. (default: 'data/fake')
  --batch_size BATCH_SIZE
                        Batch size. (default: 256)
  --epochs EPOCHS       Number of maximum epochs to train. (default: 20)
  --seed SEED           Random seed. (default: 42)
  --feature_classname {wave,lfcc,mfcc}
                        Feature classname. (default: 'lfcc')
  --model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}
                        Model classname. (default: 'ShallowCNN')
  --in_distribution {True,False}, --in_dist {True,False}
                        Whether to use in distribution experiment setup. (default: True)
  --device DEVICE       Device to use. (default: 'cuda' if possible)
  --deterministic       Whether to use deterministic training (reproducible results).
  --restore             Whether to restore from checkpoint.
  --eval_only           Whether to evaluate only.
  --debug               Whether to use debug mode.
  --debug_all           Whether to use debug mode for all models.

Example:

To make sure all models can run successfully on your device, you can run the following command to test:

python train.py --debug_all

To train the model ShallowCNN with lfcc features in the in-distribution setting, you can run the following command:

python train.py --real data/real --fake data/fake --batch_size 128 --epochs 20 --seed 42 --feature_classname lfcc --model_classname ShallowCNN

Please use inline environment variable CUDA_VISIBLE_DEVICES to specify the GPU device(s) to use. For example:

CUDA_VISIBLE_DEVICES=0 python train.py

Evaluation

By default, we directly use test set for training validation, and the best model and the best predictions will be automatically saved in the saved directory during training/testing. Go to the directory saved to see the evaluation results.

To evaluate on the test set using trained model, you can run the following command:

python train.py --feature_classname lfcc --model_classname ShallowCNN --restore --eval_only

Run the following command to re-compute the evaluation results based on saved predictions and labels:

python metrics.py

Acknowledgements

We thank Dr. Matthieu De Mari and Prof. Berrak Sisman for their teaching and guidance.
We thank Joel Frank and Lea Schönherr. Our code is partially adopted from their repository WaveFake.
We thank Prof. Liu Jun for providing GPU resources for conducting experiments for this project.

License

Our project is licensed under the MIT License.

audiodeepfakedetection's People

Contributors

Stargazers

Watchers

Forkers

jamestiotio simrit1 yihe1003 jatin2020-24 lesamo inc0mple playerberny12 suryapratap9 thenumanahmed shahjahan0275 adityakansara8 maximusarthur anmol2059 vignesh2004vasu imvision2341 noori03 ooinoing

audiodeepfakedetection's Issues

In that train.py script i got an issue regarding eval() missing 1 required positional argument: 'fake_dir' how to solve it i tried out so many things but its not working

while running train with debug error

audioread==3.0.1
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
cmake==3.27.7
colorlog==6.7.0
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
filelock==3.13.1
fonttools==4.45.0
fsspec==2023.10.0
idna==3.4
Jinja2==3.1.2
joblib==1.3.2
kiwisolver==1.4.5
lazy_loader==0.3
librosa==0.10.1
lit==17.0.5
llvmlite==0.41.1
MarkupSafe==2.1.3
matplotlib==3.8.2
mpmath==1.3.0
msgpack==1.0.7
networkx==3.2.1
numba==0.58.1
numpy==1.26.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
platformdirs==4.0.0
pooch==1.8.0
puts==0.0.8
pycparser==2.21
pyparsing==3.1.1
python-dateutil==2.8.2
requests==2.31.0
scikit-learn==1.3.2
scipy==1.11.4
six==1.16.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
threadpoolctl==3.2.0
torch==2.0.1
torchaudio==2.0.2
torchinfo==1.8.0
triton==2.0.0
typing_extensions==4.8.0
urllib3==2.1.0

This is the modules installed and
i am getting this error
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
this error coming after downgraded the torchaudio to 2.0.2

i had the a different error while training with torchaudio 2.1.1 the error was raise RuntimeError(
RuntimeError: apply_effects_file requires sox extension which is not available. Please refer to the stacktrace above for how to resolve this. how to resolve these two errors

Given the model and a audio file, print real or fake voice

I have been trying to use your model in the google drive (best.pt) model and tried to preprocess it however I keep getting different types of errors whenever I try changing something. Such as:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x241152 and 15104x128)
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 1, 40, 64600]

I am trying to do something just like your website where given the audio it displays real or fake. Is there any code or reference?

evaluate_error

when i run the evaluation code it is showing this error

2023-11-23 12:08:00,216 - ERROR - 'bool' object is not callable
Traceback (most recent call last):
File "/home/pradeep/AudioDeepFakeDetection/train.py", line 593, in main
experiment(
File "/home/pradeep/AudioDeepFakeDetection/train.py", line 430, in experiment
eval_only(
TypeError: 'bool' object is not callable

.

Issues with eval_one Function and Guidance on Using lfcc ShallowCNN Model

Hi,

I've made some modifications to the code to ensure compatibility with Windows, specifically addressing the Sox dependency with torchaudio by using Sox independently. Other than that, I've kept the original setup intact.

However, I've encountered an issue when using the eval_one function. Every audio input, whether genuine or fake, is consistently classified as fake (output always equals 1). This behavior is observed even when testing on authentic audio files.

I'd like to understand the correct procedure to utilize the lfcc ShallowCNN model that you've provided for evaluating audio files. Since I don't have an NVIDIA GPU, training is quite time-intensive for me, and I'm keen on testing the pre-trained model you've developed on new RVC voice-generated samples and others.

Any guidance on how to successfully test the model would be greatly appreciated. Thank you!

Inference a single audio file using the PyTorch model

Inference a single audio file using the PyTorch model and add into the GUI

Can I use my own voice clip?

I was wondering if after I trained my data, is it possible to use my own voice clip to detect if it is fake or not?