Giter Site home page Giter Site logo

davide-coccomini / combining-efficientnet-and-vision-transformers-for-video-deepfake-detection Goto Github PK

View Code? Open in Web Editor NEW
190.0 2.0 54.0 1.1 MB

Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" presented at ICIAP 2021.

Home Page: https://dl.acm.org/doi/abs/10.1007/978-3-031-06433-3_19

License: MIT License

Python 11.08% Jupyter Notebook 88.79% Shell 0.13%
deepfake-detection deepfakes dfdc deepfakes-classification faceforensics deep-learning vision-transformer convolutional-neural-networks pytorch efficientnet deepfake-detection-challenge deepfake-videos

combining-efficientnet-and-vision-transformers-for-video-deepfake-detection's Introduction

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

PWC

Code for Video Deepfake Detection model from "Combining EfficientNet and Vision Transformers for Video Deepfake Detection" available on Arxiv and presented at ICIAP 2021 [Pre-print PDF | Springer]. Using this repository it is possible to train and test the two main architectures presented in the paper, Efficient Vision Transformers and Cross Efficient Vision Transformers, for video deepfake detection. The architectures exploits internally the EfficientNet-Pytorch and ViT-Pytorch repositories.

Setup

Clone the repository and move into it:

git clone https://github.com/davide-coccomini/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection.git

cd Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection

Setup Python environment using conda:

conda env create --file environment.yml
conda activate deepfakes
export PYTHONPATH=.

Get the data

Download and extract the dataset you want to use from:

Preprocess the data

The preprocessing phase is based on Selim Seferbekov implementation.

In order to perform deepfake detection it is necessary to first identify and extract faces from all the videos in the dataset. Detect the faces inside the videos:

cd preprocessing
python3 detect_faces.py --data_path "path/to/videos"

By default the consideted dataset structure will be the one of DFDC but you can customize it with the following parameter:

  • --dataset: Dataset (DFDC / FACEFORENSICS)

The extracted boxes will be saved inside the "path/to/videos/boxes" folder. In order to get the best possible result, make sure that at least one face is identified in each video. If not, you can reduce the threshold values of the MTCNN on line 38 of face_detector.py and run the command again until at least one detection occurs. At the end of the execution of face_detector.py an error message will appear if the detector was unable to find faces inside some videos.

If you want to manually check that at least one face has been identified in each video, make sure that the number of files in the "boxes" folder is equal to the number of videos. To count the files in the folder use:

cd path/to/videos/boxes
ls | wc -l

Extract the detected faces obtaining the images:

python3 extract_crops.py --data_path "path/to/videos" --output_path "path/to/output"

By default the consideted dataset structure will be the one of DFDC but you can customize it with the following parameter:

  • --dataset: Dataset (DFDC / FACEFORENSICS)

Repeat detection and extraction for all the different parts of your dataset.

After extracting all the faces from the videos in your dataset, organise the "dataset" folder as follows:

- dataset
    - training_set
        - Deepfakes
            - video_name_0
                0_0.png
                1_0.png
                2_0.png
                ...
                N_0.png
            ...
            - video_name_K
                0_0.png
                1_0.png
                2_0.png
                ...
                M_0.png
        - DFDC
        - Face2Face
        - FaceShifter
        - FaceSwap
        - NeuralTextures
        - Original
    - validation_set
        ...
            ...
                ...
                ...
    - test_set
        ...
            ...
                ...
                ...

We suggest to exploit the --output_path parameter when executing extract_crops.py to build the folders structure properly.

Evaluate

Move into the choosen architecture folder you want to evaluate and download the pre-trained model:

(Efficient ViT)

cd efficient-vit
wget http://datino.isti.cnr.it/efficientvit_deepfake/efficient_vit.pth

(Cross Efficient ViT)

cd cross-efficient-vit
wget http://datino.isti.cnr.it/efficientvit_deepfake/cross_efficient_vit.pth

If you are unable to use the previous urls you can download the weights from Google Drive.

Then, issue the following commands for evaluating a given model giving the pre-trained model path and the configuration file available in the config directory:

python3 test.py --model_path "pretrained_models/[model]" --config "configs/architecture.yaml"

By default the command will test on DFDC dataset but you can customize the following parameters for both the architectures:

  • --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|DFDC)
  • --max_videos: Maximum number of videos to use for training (default: all)
  • --workers: Number of data loader workers (default: 10)
  • --frames_per_video: Number of equidistant frames for each video (default: 30)
  • --batch_size: Prediction Batch Size (default: 32)

To evaluate a customized model trained from scratch with a different architecture you need to edit the configs/architecture.yaml file.

Train

Only for DFDC dataset, prepare the metadata moving all of them (by default inside dfdc_train_part_X folders) into a subfolder:

mkdir data/metadata
cd path/to/videos/training_set
mv **/metadata.json ../../../data/metadata

In order to train the model using our architectures configurations use:

(Efficient ViT)

cd efficient-vit
python3 train.py --config configs/architecture.yaml

(Cross Efficient ViT)

cd cross-efficient-vit
python3 train.py --config configs/architecture.yaml

By default the commands will train on DFDC dataset but you can customize the following parameters for both the architectures:

  • --num_epochs: Number of training epochs (default: 300)
  • --workers: Number of data loader workers (default: 10)
  • --resume: Path to latest checkpoint (default: none)
  • --dataset: Which dataset to use (Deepfakes|Face2Face|FaceShifter|FaceSwap|NeuralTextures|All) (default: All)
  • --max_videos: Maximum number of videos to use for training (default: all)
  • --patience: How many epochs wait before stopping for validation loss not improving (default: 5)

Only for the Efficient ViT model it's also possible to custom the patch extractor and use different versions of EfficientNet (only B0 and B7) by adding the following parameter:

  • --efficient_net: Which EfficientNet version to use (0 or 7, default: 0)

Reference

@InProceedings{10.1007/978-3-031-06433-3_19,
author="Coccomini, Davide Alessandro
and Messina, Nicola
and Gennaro, Claudio
and Falchi, Fabrizio",
editor="Sclaroff, Stan
and Distante, Cosimo
and Leo, Marco
and Farinella, Giovanni M.
and Tombari, Federico",
title="Combining EfficientNet and Vision Transformers for Video Deepfake Detection",
booktitle="Image Analysis and Processing -- ICIAP 2022",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="219--229",
isbn="978-3-031-06433-3"
}

combining-efficientnet-and-vision-transformers-for-video-deepfake-detection's People

Contributors

ano0904 avatar davide-coccomini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

combining-efficientnet-and-vision-transformers-for-video-deepfake-detection's Issues

extract face not working on faceforensics dataset

Hi Can you help? im not able to extract all the faces from dataset/

(deepfakes) e5-cse-344-30:/home/mdl/amk7371/Combining-EfficientNet-and-Vision-Transformers-for-Video-Deepfake-Detection/preprocessing> python3 detect_faces.py --data_path ../../faceforensics/ --dataset FACEFORENSICS
Namespace(data_path='../../faceforensics/', dataset='FACEFORENSICS', detector_type='FacenetDetector', processes=8)
8%|████████████▏ | 713/9431 [33:02<9:24:36, 3.89s/it]

11%|█████████████████ | 1014/9431 [48:43<26:15:58, 11.23s/it]Traceback (most recent call last):
File "/home/mdl/amk7371/x86_64/envs/deepfakes/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
send_bytes(obj)
File "/home/mdl/amk7371/x86_64/envs/deepfakes/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/mdl/amk7371/x86_64/envs/deepfakes/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

Test dataset arrangement issues

Hi @davide-coccomini ,
I have arranged the folders as depicted below

  • deep_fakes
    dataset
    test_set
    ...manipulated_sequences : fake videos
    ...original_sequences : the original videos

    Now, I am working on FaceForensic data for only 5 videos (I intended to run the pre-trained model)
    With help of extract_crop.py I managed to crop the faces from the manipulated Videos.
    My Queries:

  1. What would the Original folder contain and where would that be present in this scenario of my file structure?
  2. The command that I ran is
    !python3 test.py --model_path "cross_efficient_vit.pth" --config "configs/architecture.yaml" --dataset "FACEFORENSICS" --workers 2 --batch_size 1

I changed the code for test.py : below the code snippet is present
def get_method(video, data_path):
temp = os.path.join(data_path, "manipulated_sequences").split('../')[-1]
#methods = os.listdir(os.path.join(data_path, "manipulated_sequences"))
methods = os.listdir(temp)
print(methods)
temp1 = os.path.join(data_path, "original_sequences").split('../')[-1]
#methods.extend(os.listdir(os.path.join(data_path, "original_sequences")))
methods.extend(os.listdir(temp1))
methods.append("DFDC")#Why should DFDC be added as I work with FACEFORENSIC data
methods.append("Original")
selected_method = ""
for method in methods:
if method in video:
selected_method = method
break
return selected_method

Please help!

Details on training and testing

Hi @davide-coccomini,
As you mentioned in the paper, that you have used both FF++ and DFDC dataset to train the model:

  • Which version of FF++ dataset do you use (raw/c23/c40)?
  • As for DFDC only the training data is available and the public and private test sets are not released. How do you use the DFDC dataset? Do you do a split from the training set itself into train/val/test?

data set

Hello, does your preprocessing code support using only the FaceForensics++ dataset?When I run your code for detecting faces, I get the above error
image
This is where my dataset is stored
1645753754(1)

Grouping of faces corresponding to same person for inference

Hi @davide-coccomini, I see that in test.py file for inference, you group the faces corresponding to each person using the index in the filename of each bounding box. Is that correct? Or is there any other thing that you have added in the code for grouping bbox corresponding to each person before feeding into the model for prediction?

# Group the faces with the same index, reduce probabiity to skip some faces in the same video 
for path in frames_paths:
    for i in range(0,3): # Consider up to 3 faces per video
        if "_" + str(i) in path:
            if i not in frames_paths_dict.keys():
                frames_paths_dict[i] = [path]
            else:
                frames_paths_dict[i].append(path)

How to get the pretrained model?

@davide-coccomini
wget ...
--2021-09-10 21:44:46-- http://.../
Resolving ... (...)... failed: Name or service not known.
wget: unable to resolve host address ‘...’

I'm stuck with downloading the pre trained model.

environment

sorry but I had some problems in creating the environment, can you give me some advice about this issue?
I followed your command

conda env create --file environment.yml -n deepfakesreate --file environment.yml -n deepfakes
and delete below lines in environment.yml
name:deepfakes(first line)
prefix:......(last line)
thank you so much and looking forward to your reply

weights

Hello,

Thank you for sharing this deepfakes model.

I have a question regarding the script efficient_vit.py.

It seems to load a checkpoint : ""weights/final_999_DeepFakeClassifier_tf_efficientnet_b7_ns_0_23" but I haven't find it in the repo. is it possible to have an upload on this one ?
Moreover what is the difference between this checkpoint and efficient_vit.pth ?

Best regards,

Multiprocesssing Error

1668481721820
1668481826076
Hi, I‘ve followed your steps in README.md, the preprocessing part worked well for me. But when it comes to training, I've received the error above. Could you give me some suggestion, please!

DFDC Pre-processing timeline

Hi Davide,
When you pre-processed the 50 subfolders of DFDC, did you do it sequentially or in parallel? How much time did it take for you pre-process the entire DFDC dataset (just to obtain the .json and not the images from .json)? If you did it in parallel, how did you achieve parallelism in this case? Currently, its taking me approximately 3 hrs to pre-process 1300 videos (that is 8-9 seconds per video). Is this the expected behavior when running on Nvidia A30? With this rate, it will take me about 13 days to pre-process all the 120,000 videos in DFDC just to obtain the .json (that is, to run detect_faces.py on entire DFDC). I downloaded c23 DFDC videos. Please suggest if this is the expected behavior or not. Also, I would really appreciate if you could suggest ways to speed it up that worked for you to pre-process faster and not wait for days. Thanks!

train_

image

Can you take a look at this question? Can you help me?

About Cuda

Unfortunately, i cannot use CUDA because i don't have Nvidia Graphics Card. How can i run your code without Cuda

Combining EfficientNet and Vision Transformers

Hello, I'm a university student, I'm currently researching the topic of deepfake detection. I also want to follow the direction of Combining-EfficientNet-and-Vision-Transformers like you but it seems it's quite difficult for you to detect. develop a simpler project than this one, can you give me a reference? thank you

About datapropress

I have a confusing about dataprocess. I find that you resize the frame in face_detector.py in line68 when you create VideoDatasset. Will it decrease the final result? And the resize operator it is neccessory or not if I want to use dataprocess like this for other model?
image

Mistake?

Should that be len(video_faces) instead of len(video) in line 252 of test.py in cross-efficient-vit folder?

Error Encountered in Pretrained Model Evaluation

I hope this message finds you well.

I wanted to extend my sincere appreciation for the significant contributions you have made to our project. Your guidance and instructions have been invaluable in helping us navigate the complexities of deepfake detection.

I am writing to inform you about an issue we encountered while evaluating the pretrained model, as per the instructions outlined in the README file. Upon running the evaluation script (test.py), we encountered the following error:

FileNotFoundError: [Errno 2] No such file or directory: '../../deep_fakes/dataset/test_set/DFDC'

It appears that the script is unable to locate the specified directory 'test_set/DFDC' within the dataset directory structure. We have verified the directory structure and ensured that the necessary data is present, but the error persists.

We would greatly appreciate your guidance on resolving this issue. Any insights or suggestions you can provide would be immensely helpful in overcoming this obstacle and progressing with our deepfake detection app development.

Thank you once again for your ongoing support and assistance. We look forward to hearing from you soon.

Warm regards,

Ayesha

no faces detected when run the detect_faces.py

@davide-coccomini Hi,sorry to bother you here,but I meet a problem which shown below in the screenshot:
image
there is no faces detected in the DFDC dataset even when I set the threshold of MTCNN very low,and here is the structure of my data:
image
as you can see,I etracted part of the whole DFDC dataset there because I haven't downloaded all of them yet,However,I checked the code in detect_faces.py and found that the script will traverse the directory under dfdc and then read the video from the sub-directory.so
I was confused why it can't detected any faces from the video,so is there something wrong with my data_path?hope you can give me some advice!:-)

License?

Could you please add a license, e.g. MIT license, so I could fork this and continue working on it?

Patch size

Hi, it's possible edit the patch size and the related options (E.g. 14 instead of 7)? Or is required retrain the model?
Thanks.

Convolutional Cross ViT architecture

Hi @davide-coccomini,

When I read your paper, I had a question about Cross ViT architecture. Can you help me to answer?
In your convolutional Cross ViT architecture, which used convolutional architecture by Wodajo and Atnafu, did you use pre-trained model, being from the author or train from scratch?

extract error

I only have Celeb data set now, and the folder structure of CeleB is shown in the figure. But I managed to extract the box file in DETECt_face. py. Please see if that's the case.and I set "--dataset=DFDC". But when I execute extract. Py, I get the following error. I wonder if it has something to do with my folder structure.I wish your help!
image

Custom test set

First of all, I want to express my appreciation for your repository. It's incredibly helpful to have access to pretrained models like the ones you've provided. Thank you for sharing your work with the community!

I have a question regarding the pretrained models you've uploaded. I'm interested in testing your models on our own test set to evaluate their performance. Could you please guide me on how I can do that? Any instructions or pointers would be greatly appreciated.

Thanks again for your valuable contributions!

AttributeError: 'EfficientNet' object has no attribute 'delete_blocks'

Loaded pretrained weights for efficientnet-b0

AttributeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 model = CrossEfficientViT(config=config)
2 model.eval()

Cell In[14], line 267, in CrossEfficientViT.init(self, config)
262 dropout = config['model']['dropout']
263 emb_dropout = config['model']['emb-dropout']
--> 267 self.sm_image_embedder = ImageEmbedder(dim = sm_dim, image_size = image_size, patch_size = sm_patch_size, dropout = emb_dropout, efficient_block = 16, channels=sm_channels)
268 self.lg_image_embedder = ImageEmbedder(dim = lg_dim, image_size = image_size, patch_size = lg_patch_size, dropout = emb_dropout, efficient_block = 1, channels=lg_channels)
270 self.multi_scale_encoder = MultiScaleEncoder(
271 depth = depth,
272 sm_dim = sm_dim,
(...)
289 dropout = dropout
290 )

Cell In[14], line 186, in ImageEmbedder.init(self, dim, image_size, patch_size, dropout, efficient_block, channels)
184 assert image_size % patch_size == 0, 'Image dimensions must be divisible by the patch size.'
185 self.efficient_net = EfficientNet.from_pretrained('efficientnet-b0')
--> 186 self.efficient_net.delete_blocks(efficient_block)
187 self.efficient_block = efficient_block
189 for index, (name, param) in enumerate(self.efficient_net.named_parameters()):

File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1614, in Module.getattr(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).name, name))

AttributeError: 'EfficientNet' object has no attribute 'delete_blocks'

Failed to setup python environment through environment.yml

Initially, I want to setup python environment through conda env create --file environment.yml. However, there are something wrong with the version.
image
Then I delete the concrete version of the packages. But there occurred something wrong new.

So could you please check wether there are something wrong in the environment file.

train

屏幕截图 2022-06-30 111953
Hi,can you help me.

About patch

Thanks for your work @davide-coccomini
In the figure of the paper, I found that the dimensions of patch images with cross structure and non cross structure are inconsistent. In cross structure, the size of patch seems to be greater than 7 * 7 or 56 * 56. Instead of the cross structure, the size of the patch is 7 * 7. However, in the cross structure, isn't the feature patch size generated by s-branch also 7 * 7?

Some questions about num-classes and class_weights

(1)I Would like to ask why the value of num_classes=1 instead of 2?
(2) train_counters = collections.Counter(image[1] for image in train_dataset)
class_weights = train_counters[0] / train_counters[1]
loss_fn = torch.nn.BCEWithLogitsLoss(pos_weight=torch.tensor([class_weights]))
In the case where the number of frames in each video is fixed and the same, is the class_weights=1?
I have a weak foundation, thank you very much for your answer。

How to train crossvit on b5 or higher?

I simply changed the efficientnet name in code, it loads b5 pretrained weight successfully. But while training it stuck at
x = self.to_patch_embedding(x) in forward function. Thanks for your great work and help!

extract error

image
I used the Celeb dataset, but changed the format of the Celeb dataset to f ++ to be consistent with your code. The folder architecture for storing videos and the folder architecture for storing face pictures are shown in the figure.The photo with the train folder is the folder that holds the face image.
image

Environment setting for MacM1

Hello, I have an issue creating the deepfakes environment. As I try to create it from environment.yml I get the following error :

The following specifications were found to be incompatible with your system:

  • feature:/osx-64::__osx==10.16=0
  • feature:/osx-64::__unix==0=0
  • feature:|@/osx-64::__osx==10.16=0
  • feature:|@/osx-64::__unix==0=0
  • nb_conda=2.2.1 -> __unix
  • nb_conda=2.2.1 -> __win
  • notebook=6.3.0 -> ipykernel -> __linux
  • notebook=6.3.0 -> ipykernel -> __osx
  • notebook=6.3.0 -> ipykernel -> __win

Your installed version is: 10.16

Then as I try to activate the deepfakes environment, I get the following message :

EnvironmentNameNotFound: Could not find conda environment: deepfakes .

I don't know why this doesn't work (I think this is because of incompatibility with the new version of MAC but I'm not sure though) and your help will be highly appreciated !

Thank you,
Francois

Dateset

hello, i typed this command but why doesn't the system respond
image
image
image
can you help me see what's wrong
thanks!

How do I train my dataset?

Hello! I have the image dataset ready and split it into true and false folders, how can I use them for direct training instead of read_frame()?
图片

Question about more than one person on a video.

Hi, Im new to machine learning if im sorry if I asked any weird question, but how do you guys manage when there are one or more faces in a video, where FF++ fake have only one manipulated face when there are more than one faces in a scene. Thank you in advance!

test AUC

Hello, I trained a total of 110000 real images and 100000 fake images using dfdc and ff++, but the final test only achieved an AUC of 0.885. Can you give me some suggestions. Thank you.

Potential bug for detecting deepfake video if there are multiple people.

In preprocessing code detect_face.py and extract_crop.py, it seems that the code does not pay attention to the order of faces for each frame, but in the evaluation code test.py, it seems that you did assume the order of faces is consistent in all frames.

Because of that, the code in test.py may result in unexpected behavior like mixing different people's faces into the same group and do the wrong prediction.

Do I misunderstand something? or is it not supposed to deploy on multiple faces video?

Validation set

Hey, Davide, when I was working with the DFDC dataset, I downloaded the validation set and test set of DFDC on the https://dfdc.ai/ like you, but I don't know why the request always fails and cannot be downloaded
微信图片_20230520153129

The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension

When I training the EfficientViT, I tried to change the size of the batch size to 64, and an error was raised.
RuntimeError: The size of tensor a (64) must match the size of tensor b (32) at non-singleton dimension 0 . In efficient-vit/efficient_vit.py line 169 : x += self.pos_embedding[0:shape]. I think there may be an error in setting the location code.
I modified it. I don't know if it's correct, but it works normally

import torch
from torch import nn
from einops import rearrange
from efficientnet_pytorch import EfficientNet
import cv2
import re
from utils import resize
import numpy as np
from torch import einsum
from random import randint


class Residual(nn.Module):
	def __init__(self, fn):
		super().__init__()
		self.fn = fn

	def forward(self, x, **kwargs):
		return self.fn(x, **kwargs) + x


class PreNorm(nn.Module):
	def __init__(self, dim, fn):
		super().__init__()
		self.norm = nn.LayerNorm(dim)
		self.fn = fn

	def forward(self, x, **kwargs):
		return self.fn(self.norm(x), **kwargs)


class FeedForward(nn.Module):
	def __init__(self, dim, hidden_dim, dropout=0.):
		super().__init__()
		self.net = nn.Sequential(
			nn.Linear(dim, hidden_dim),
			nn.GELU(),
			nn.Dropout(dropout),
			nn.Linear(hidden_dim, dim),
			nn.Dropout(dropout)
		)

	def forward(self, x):
		return self.net(x)


class Attention(nn.Module):
	def __init__(self, dim, heads=8, dim_head=64, dropout=0.):
		super().__init__()
		inner_dim = dim_head * heads
		project_out = not (heads == 1 and dim_head == dim)

		self.heads = heads
		self.scale = dim_head ** -0.5

		self.attend = nn.Softmax(dim=-1)
		self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)

		self.to_out = nn.Sequential(
			nn.Linear(inner_dim, dim),
			nn.Dropout(dropout)
		) if project_out else nn.Identity()

	def forward(self, x):
		b, n, _, h = *x.shape, self.heads
		qkv = self.to_qkv(x).chunk(3, dim=-1)
		q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), qkv)

		dots = einsum('b h i d, b h j d -> b h i j', q, k) * self.scale

		attn = self.attend(dots)

		out = einsum('b h i j, b h j d -> b h i d', attn, v)
		out = rearrange(out, 'b h n d -> b n (h d)')
		return self.to_out(out)


class Transformer(nn.Module):
	def __init__(self, dim, depth, heads, dim_head, mlp_dim, dropout=0.):
		super().__init__()
		self.layers = nn.ModuleList([])
		for _ in range(depth):
			self.layers.append(nn.ModuleList([
				PreNorm(dim, Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)),
				PreNorm(dim, FeedForward(dim=dim, hidden_dim=mlp_dim, dropout=0))
			]))

	def forward(self, x):
		for attn, ff in self.layers:
			x = attn(x) + x
			x = ff(x) + x
		return x


class EfficientViT(nn.Module):
	def __init__(self, config, channels=512, selected_efficient_net=0):
		super().__init__()

		image_size = config['model']['image-size']
		patch_size = config['model']['patch-size']
		num_classes = config['model']['num-classes']
		dim = config['model']['dim']
		depth = config['model']['depth']
		heads = config['model']['heads']
		mlp_dim = config['model']['mlp-dim']
		emb_dim = config['model']['emb-dim']
		dim_head = config['model']['dim-head']
		dropout = config['model']['dropout']
		emb_dropout = config['model']['emb-dropout']

		assert image_size % patch_size == 0, 'image dimensions must be divisible by the patch size'

		self.selected_efficient_net = selected_efficient_net

		if selected_efficient_net == 0:
			self.efficient_net = EfficientNet.from_pretrained('efficientnet-b0')
		else:
			self.efficient_net = EfficientNet.from_pretrained('efficientnet-b7')
			checkpoint = torch.load("weights/final_999_DeepFakeClassifier_tf_efficientnet_b7_ns_0_23", map_location="cpu")
			state_dict = checkpoint.get("state_dict", checkpoint)
			self.efficient_net.load_state_dict({re.sub("^module.", "", k): v for k, v in state_dict.items()}, strict=False)

		for i in range(0, len(self.efficient_net._blocks)):
			for index, param in enumerate(self.efficient_net._blocks[i].parameters()):
				if i >= len(self.efficient_net._blocks) - 3:
					param.requires_grad = True
				else:
					param.requires_grad = False

		self.num_patches = (7 // patch_size) ** 2
		patch_dim = channels * patch_size ** 2

		self.patch_size = patch_size

		self.pos_embedding = nn.Parameter(torch.randn(1, self.num_patches + 1, dim))
		self.patch_to_embedding = nn.Linear(patch_dim, dim)
		self.cls_token = nn.Parameter(torch.randn(1, 1, dim))
		self.dropout = nn.Dropout(emb_dropout)
		self.transformer = Transformer(dim, depth, heads, dim_head, mlp_dim, dropout)

		self.to_cls_token = nn.Identity()

		self.mlp_head = nn.Sequential(
			nn.Linear(dim, mlp_dim),
			nn.ReLU(),
			nn.Linear(mlp_dim, num_classes)
		)

	def forward(self, img, mask=None):
		p = self.patch_size
		x = self.efficient_net.extract_features(img)  # 1280x7x7
		# x = self.features(img)
		'''
		for im in img:
			image = im.cpu().detach().numpy()
			image = np.transpose(image, (1,2,0))
			cv2.imwrite("images/image"+str(randint(0,1000))+".png", image)
		
		x_scaled = []
		for idx, im in enumerate(x):
			im = im.cpu().detach().numpy()
			for patch_idx, patch in enumerate(im):
				patch = (255*(patch - np.min(patch))/np.ptp(patch)) 
				im[patch_idx] = patch
				#cv2.imwrite("patches/patches_"+str(idx)+"_"+str(patch_idx)+".png", patch)
			x_scaled.append(im)
		x = torch.tensor(x_scaled).cuda()   
		'''

		# x2 = self.features(img)
		y = rearrange(x, 'b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1=p, p2=p)
		# y2 = rearrange(x2, 'b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = p, p2 = p)
		y = self.patch_to_embedding(y)
		cls_tokens = self.cls_token.expand(x.shape[0], -1, -1)
		x = torch.cat((cls_tokens, y), 1)
		shape = x.shape[0]
		# x += self.pos_embedding[0:shape]
		x += self.pos_embedding[:, :(self.num_patches + 1)]
		x = self.dropout(x)
		x = self.transformer(x)
		x = self.to_cls_token(x[:, 0])

		return self.mlp_head(x)

Inquiry Regarding Dataset Architecture and Organization

I hope this message finds you well. I would like to express my gratitude for the significant contributions you have made to the project. Your work has been truly inspiring and has provided me with valuable insights into the tasks I am aiming to accomplish.

Unfortunately, due to various reasons, I have encountered challenges in accessing the dataset hosted on dfdc.ai. Instead, I have been able to locate the dfdc dataset within the Kaggle project. I am reaching out to kindly inquire if you could provide some insights into the specific architecture of the dataset and the roles played by each of its components.

Additionally, I have a specific question regarding the contents of the DFDC folder. Are the data stored in this folder a combination of video frames with both FAKE and REAL attributes? Or do all REAL video frames exclusively reside in the ORIGINAL folder, while the remaining FAKE video frames are distributed across folders such as DFDC, FACE++, and others?

I understand that your time is valuable, and I genuinely appreciate your consideration in addressing my inquiries amidst your busy schedule.

Thank you for your time and assistance.

about dataset

Thanks for your contribution @davide-coccomini
I have some questions about the training data,
image
In the sixth line of script utils.py . Why did you filter out the part of "DeepFakeDetection" and "actors"?
Looking forward to your reply, thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.