Giter Site home page Giter Site logo

aiden200 / 2d3mf Goto Github PK

View Code? Open in Web Editor NEW
28.0 28.0 1.0 124.29 MB

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"

License: Other

Python 84.52% Shell 2.44% Makefile 0.01% Batchfile 0.01% Jupyter Notebook 11.89% C++ 0.34% C 0.01% Cuda 0.56% Perl 0.03% Cython 0.15% Lua 0.05%
audio deep-learning deepfake-detection machine-learning multimodal pytorch video

2d3mf's Introduction

Hi there ๐Ÿ‘‹, I'm Aiden

Typing SVG

  • Here is my website!
  • ๐Ÿ”ญ Iโ€™m currently a: Graduate Student at the University of Southern California!
  • ๐ŸŒฑ Iโ€™m currently learning: ML research - CV, Multimodal LLMs, Recommendation Systems.
  • ๐Ÿ’ฌ Ask me about: Multimodal LLMs!
  • ๐Ÿ“ง Email: [email protected]
  • linkedin Linkedin: Aiden Chang
  • ๐Ÿ–ฅ๏ธ Medium: aidenchang
  • โšก Fun fact: I've lived in three different countries & was ranked 47th in the nation for freestyle skiing!

Aiden's GitHub stats

Toolset

python css3 c html5 javascript typescript

Most Used Languages

2d3mf's People

Contributors

adriansroman avatar aiden200 avatar aromanusc avatar controlnet avatar cy3021561 avatar hermes7308 avatar hyunkeup avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

bearownage

2d3mf's Issues

Pre-train Resnet for emotion detection

The task is to perform pre-training on a ResNet model on a simple emotion detection classification task. The input to the ResNet should be MFCCs computed using 1second of audio with a sampling rate of sr=44100Hz and n_mfcc=10 i.e. you can uselibrosa.feature.mfcc(y=y_1sec_audio, sr=44100, n_mfcc=10).

The network can be trained with audioclips from the RAVDESS dataset. The labels to predict should be 8:
01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised

This pre-trained network will then be used as a feature extractor within our 2D3MF pipeline.

Grid Search

To test with autoML, implement Grid Search

Middle Fusion Ablations

We want to try out different middle fusion structures. Please implement the following ablations for middle fusions using the files
model/multi_modal_middle_fusion.py, model/transformer_blocks.py, and model/classifier.py.

Unlike the diagram, the values are currently multiplied at the end. Here are the ablations:

Middle_fusion_substitutions (1)

Unify requirements.txt

Currently we have two requirements.txt files. This was done as part of the development process. Now that the repo is stable and the major chunk development is finished we should unify the requirements into a single file

General Preprocessing

We need a general preprocessing pipeline that crops, tags the videos as real(-0) or fake(-1). Then generates the .npy files based off of a backbone. Args: data dir, backbone type (small, base, large).

Please specify a required structure for input. Example:

**Data dir**
   Video
       Real
       Fake
   Audio
       Real
       Fake

Hidden Dimension Change

Under model/classifier.py, self.hidden_layers needs to be able to change dynamically without throwing an error. It is currently set to 128.

Classes that are effected from self.hidden_layers:
AudioCNNPool, VideoCnnPool, AttentionBlock

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.