Giter Site home page Giter Site logo

markhershey / audiodeepfakedetection Goto Github PK

View Code? Open in Web Editor NEW
60.0 3.0 17.0 201.23 MB

SUTD 50.039 Deep Learning Course Project (2022 Spring)

Home Page: https://markhh.com/AudioDeepFakeDetection/

License: MIT License

Python 59.16% HTML 17.56% TeX 23.28%
audio-deepfake-detection deepfake-detection audio deep-learning

audiodeepfakedetection's Introduction

Audio Deep Fake Detection

A Course Project for SUTD 50.039 Theory and Practice of Deep Learning (2022 Spring)

Created by Mark He Huang, Peiyuan Zhang, James Raphael Tiovalen, Madhumitha Balaji, and Shyam Sridhar.

Check out our: Project Report | Interactive Website

Setup Environment

# Set up Python virtual environment
python3 -m venv venv && source venv/bin/activate

# Make sure your PIP is up to date
pip install -U pip wheel setuptools

# Install required dependencies
pip install -r requirements.txt

Setup Datasets

You may download the datasets used in the project from the following URLs:

  • (Real) Human Voice Dataset: LJ Speech (v1.1)
    • This dataset consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.
  • (Fake) Synthetic Voice Dataset: WaveFake (v1.20)
    • The dataset consists of 104,885 generated audio clips (16-bit PCM wav).

After downloading the datasets, you may extract them under data/real and data/fake respectively. In the end, the data directory should look like this:

data
├── real
│   └── wavs
└── fake
    ├── common_voices_prompts_from_conformer_fastspeech2_pwg_ljspeech
    ├── jsut_multi_band_melgan
    ├── jsut_parallel_wavegan
    ├── ljspeech_full_band_melgan
    ├── ljspeech_hifiGAN
    ├── ljspeech_melgan
    ├── ljspeech_melgan_large
    ├── ljspeech_multi_band_melgan
    ├── ljspeech_parallel_wavegan
    └── ljspeech_waveglow

Model Checkpoints

You may download the model checkpoints from here: Google Drive. Unzip the files and replace the saved directory with the extracted files.

Training

Use the train.py script to train the model.

usage: train.py [-h] [--real_dir REAL_DIR] [--fake_dir FAKE_DIR] [--batch_size BATCH_SIZE] [--epochs EPOCHS]
                [--seed SEED] [--feature_classname {wave,lfcc,mfcc}]
                [--model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}]
                [--in_distribution {True,False}] [--device DEVICE] [--deterministic] [--restore] [--eval_only] [--debug] [--debug_all]

optional arguments:
  -h, --help            show this help message and exit
  --real_dir REAL_DIR, --real REAL_DIR
                        Directory containing real data. (default: 'data/real')
  --fake_dir FAKE_DIR, --fake FAKE_DIR
                        Directory containing fake data. (default: 'data/fake')
  --batch_size BATCH_SIZE
                        Batch size. (default: 256)
  --epochs EPOCHS       Number of maximum epochs to train. (default: 20)
  --seed SEED           Random seed. (default: 42)
  --feature_classname {wave,lfcc,mfcc}
                        Feature classname. (default: 'lfcc')
  --model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}
                        Model classname. (default: 'ShallowCNN')
  --in_distribution {True,False}, --in_dist {True,False}
                        Whether to use in distribution experiment setup. (default: True)
  --device DEVICE       Device to use. (default: 'cuda' if possible)
  --deterministic       Whether to use deterministic training (reproducible results).
  --restore             Whether to restore from checkpoint.
  --eval_only           Whether to evaluate only.
  --debug               Whether to use debug mode.
  --debug_all           Whether to use debug mode for all models.

Example:

To make sure all models can run successfully on your device, you can run the following command to test:

python train.py --debug_all

To train the model ShallowCNN with lfcc features in the in-distribution setting, you can run the following command:

python train.py --real data/real --fake data/fake --batch_size 128 --epochs 20 --seed 42 --feature_classname lfcc --model_classname ShallowCNN

Please use inline environment variable CUDA_VISIBLE_DEVICES to specify the GPU device(s) to use. For example:

CUDA_VISIBLE_DEVICES=0 python train.py

Evaluation

By default, we directly use test set for training validation, and the best model and the best predictions will be automatically saved in the saved directory during training/testing. Go to the directory saved to see the evaluation results.

To evaluate on the test set using trained model, you can run the following command:

python train.py --feature_classname lfcc --model_classname ShallowCNN --restore --eval_only

Run the following command to re-compute the evaluation results based on saved predictions and labels:

python metrics.py

Acknowledgements

License

Our project is licensed under the MIT License.

audiodeepfakedetection's People

Contributors

jamestiotio avatar madhu-balaji-01 avatar markhershey avatar shsr2001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

audiodeepfakedetection's Issues

while running train with debug error

audioread==3.0.1
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
cmake==3.27.7
colorlog==6.7.0
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
filelock==3.13.1
fonttools==4.45.0
fsspec==2023.10.0
idna==3.4
Jinja2==3.1.2
joblib==1.3.2
kiwisolver==1.4.5
lazy_loader==0.3
librosa==0.10.1
lit==17.0.5
llvmlite==0.41.1
MarkupSafe==2.1.3
matplotlib==3.8.2
mpmath==1.3.0
msgpack==1.0.7
networkx==3.2.1
numba==0.58.1
numpy==1.26.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
platformdirs==4.0.0
pooch==1.8.0
puts==0.0.8
pycparser==2.21
pyparsing==3.1.1
python-dateutil==2.8.2
requests==2.31.0
scikit-learn==1.3.2
scipy==1.11.4
six==1.16.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
threadpoolctl==3.2.0
torch==2.0.1
torchaudio==2.0.2
torchinfo==1.8.0
triton==2.0.0
typing_extensions==4.8.0
urllib3==2.1.0

This is the modules installed and
i am getting this error
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
this error coming after downgraded the torchaudio to 2.0.2

i had the a different error while training with torchaudio 2.1.1 the error was raise RuntimeError(
RuntimeError: apply_effects_file requires sox extension which is not available. Please refer to the stacktrace above for how to resolve this. how to resolve these two errors

Given the model and a audio file, print real or fake voice

I have been trying to use your model in the google drive (best.pt) model and tried to preprocess it however I keep getting different types of errors whenever I try changing something. Such as:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x241152 and 15104x128)
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 1, 40, 64600]

I am trying to do something just like your website where given the audio it displays real or fake. Is there any code or reference?

evaluate_error

when i run the evaluation code it is showing this error

2023-11-23 12:08:00,216 - ERROR - 'bool' object is not callable
Traceback (most recent call last):
File "/home/pradeep/AudioDeepFakeDetection/train.py", line 593, in main
experiment(
File "/home/pradeep/AudioDeepFakeDetection/train.py", line 430, in experiment
eval_only(
TypeError: 'bool' object is not callable

Issues with eval_one Function and Guidance on Using lfcc ShallowCNN Model

Hi,

I've made some modifications to the code to ensure compatibility with Windows, specifically addressing the Sox dependency with torchaudio by using Sox independently. Other than that, I've kept the original setup intact.

However, I've encountered an issue when using the eval_one function. Every audio input, whether genuine or fake, is consistently classified as fake (output always equals 1). This behavior is observed even when testing on authentic audio files.

I'd like to understand the correct procedure to utilize the lfcc ShallowCNN model that you've provided for evaluating audio files. Since I don't have an NVIDIA GPU, training is quite time-intensive for me, and I'm keen on testing the pre-trained model you've developed on new RVC voice-generated samples and others.

Any guidance on how to successfully test the model would be greatly appreciated. Thank you!

Can I use my own voice clip?

I was wondering if after I trained my data, is it possible to use my own voice clip to detect if it is fake or not?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.