- Several important packages
- torch == 2.0.0
- torchaudio == 2.0.1
- transformers == 4.29.0
- accelerate == 0.19.0
- deepspeed == 0.8.3
- Windows10
- Ubuntu20.04
- GPU V100-16G * 4
- stage1 (Semantic Align Pre-training)
- stage2 (Caption-only Training)
- SCCM Pre-training
- Zero-shot Inference
- stage3 (Self-Reward Finetuning)