This repository contains the code and resources for performing image classification using Vision Transformer (ViT) models. Vision Transformers have gained popularity in computer vision tasks, demonstrating competitive performance compared to traditional convolutional neural networks.
The goal of this project is to explore and implement Vision Transformer models for image classification tasks. The repository provides a comprehensive set of code and resources to train, evaluate, and deploy Vision Transformer models on various datasets. It also includes pre-trained models and scripts for fine-tuning them on custom datasets.
- Implementation of Vision Transformer models using popular deep learning frameworks such as PyTorch and TensorFlow.
- Support for various image classification datasets, including CIFAR-10, ImageNet, and custom datasets.
- Pre-trained Vision Transformer models for transfer learning and fine-tuning on new datasets.
- Training scripts with customizable hyperparameters and training configurations.
- Evaluation scripts to measure the performance of Vision Transformer models.
- Inference scripts for deploying trained models on new images.
Make sure you have the following dependencies installed:
- Python 3.7+
- PyTorch (for PyTorch-based implementation) or TensorFlow (for TensorFlow-based implementation)
- NumPy
- Matplotlib
- scikit-learn Please refer to the individual implementation directories for specific version requirements..