jdai-cv / fada Goto Github PK

View Code? Open in Web Editor NEW

140.0 9.0 25.0 8.36 MB

(ECCV 2020) Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation

License: Other

Python 99.32% Shell 0.68%

domain-adaptation semantic-segmentation

fada's Introduction

Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation (ECCV 2020)

This is a pytorch implementation of FADA.

Prerequisites

Python 3.6
Pytorch 1.2.0
torchvision from master
yacs
matplotlib
GCC >= 4.9
OpenCV
CUDA >= 9.0

Step-by-step installation

conda create --name fada -y python=3.6
conda activate fada

# this installs the right pip and dependencies for the fresh python
conda install -y ipython pip

pip install ninja yacs cython matplotlib tqdm opencv-python imageio mmcv

# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 9.2
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=9.2 -c pytorch

Getting started

Download The GTA5 Dataset
Download The SYNTHIA Dataset
Download The Cityscapes Dataset
Symlink the required dataset

ln -s /path_to_gta5_dataset datasets/gta5
ln -s /path_to_synthia_dataset datasets/synthia
ln -s /path_to_cityscapes_dataset datasets/cityscapes

Generate the label statics file for GTA5 and SYNTHIA Datasets by running

python datasets/generate_gta5_label_info.py -d datasets/gta5 -o datasets/gta5/
python datasets/generate_synthia_label_info.py -d datasets/synthia -o datasets/synthia/

The data folder should be structured as follows:

├── datasets/
│   ├── cityscapes/     
|   |   ├── gtFine/
|   |   ├── leftImg8bit/
│   ├── gta5/
|   |   ├── images/
|   |   ├── labels/
|   |   ├── gtav_label_info.p
│   ├── synthia/
|   |   ├── RAND_CITYSCAPES/
|   |   ├── synthia_label_info.p
│   └── 			
...

Train

We provide the training script using 4 Tesla P40 GPUs. Note that when generating pseudo labels for self distillation, the link to the pseudo label directory should be updated here.

bash train_with_sd.sh

Evaluate

python test.py -cfg configs/deeplabv2_r101_tgt_self_distill.yaml resume g2c_sd.pth

Tip: For those who are interested in how performance change during the process of adversarial training, test.py also accepts directory as the input and the results will be stored in a csv file.

Pretrained weights

Our pretrained models for Synthia -> CityScapes task(s2c) and GTA5 -> CityScapes task(g2c) are available via Google Drive.

Visualization results

Acknowledge

Some codes are adapted from maskrcnn-benchmark and semseg. We thank them for their excellent projects.

Citation

If you find this code useful please consider citing

@InProceedings{Haoran_2020_ECCV,
  author = {Wang, Haoran and Shen, Tong and Zhang, Wei and Duan, Lingyu and Mei, Tao},
  title = {Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation},
  booktitle = {The European Conference on Computer Vision (ECCV)},
  month = {August},
  year = {2020}
}

fada's People

Contributors

Stargazers

Watchers

fada's Issues

Problems when you save soft labels for self distill

Process of saving soft labels happen in test.py

mask = get_color_pallete(pred, "city")
mask_filename = name[0] if len(name[0].split("/"))<2 else name[0].split("/")[1]
mask.save(os.path.join(output_folder, mask_filename))

You saved them as PIL images with colors(If I understand right)

But when you loaded them in cityscapes_self_distill.py

label = np.array(Image.open(datafile["label"]),dtype=np.float32)

Then, you added ignore label

label_copy = self.ignore_label * np.ones(label.shape, dtype=np.float32)
for k, v in self.id_to_trainid.items():
label_copy[label == k] = v

for k in self.trainid2name.keys():

label_copy[label == k] = k

label = Image.fromarray(label_copy)

I think the images you loaded is not in the format with label id but color code

Please show me where I'm wrong, thanks.

How to implement the CCD in pytorch?

Hi,
I'm recently trying to implement CCD in pytorch on source and target training data. However, the "sky" in "g2c_sd.pth" usually gets very large intra-class distance (resnet-101). The following pseudo code is how I calculate the intra-class distance between features belong to the class i and the feature of class center i :

tensor = x - mu_i
tensor = tensor[label == i]
intra-distance = ((torch.norm(tensor, p=1, dim=1)) ** 2 / 2048).sum() / number of pixels belongs to class i

where the shape of each variable is:
x: [bs, 2048, num of pixels belongs to class i]
mu_i: [1, 2048]
tensor: [bs, 2048, num of pixels belongs to class i]

I calculate the inter-class distance in a similar way: replace 1) into tensor = mu_i - mu_j.
Am I correct?

Best regards,
Ut

How do you use thresholds in your code？

Good job!However,I have some questions.
As mentioned in the paper, soft labels are also filtered by the threshold. In the code, I only see this:
tgt_soft_label[tgt_soft_label>0.9] = 0.9
Please explain this code.Why don't you filter out confidence lower than the threshold.

Soft labels or hard labels at self distillation training ?

at the self distillation phase, you saved the pseudo labels as hard labels instead of soft ones. Why did you name the folder soft_labels ?

why is the test size different from the train size in the self-distill phase ?

I trained the self-distill phase and found that the train size is (1024, 512) but the test size is (2048, 1024). Fortunately, the mIoU increased. But I felt so confused because of this. Can you explain me this issue ? Thanks !

There is a bit of inconsistency between the code and the paper

Hi,I find there is a bit of inconsistency between the code and the paper.
In codes,adversarial loss is caculated by
loss_adv_tgt = 0.001*soft_label_cross_entropy(tgt_D_pred, torch.cat((tgt_soft_label, torch.zeros_like(tgt_soft_label)), dim=1))
in which you caculate the loss between target prediction and target labels.

However,in the paper,the adversarial loss is written like this:

Adverasarial loss is caculated between target label and source prediction.
right?

Why Test Input Size of self-distillation phase is 10242048? instead of 5121024 as before?

@wyvernbai @krumo Firstly, thanks a lot for your great work. However, I have a question about the test input size when evaluating the self-distillation model. Because I have seen that you set INPUT_SIZE_TEST: (2048, 1024) in the fille configs/deeplabv2_r101_tgt_self_distill.yaml.

I am a little confused because when evaluating other models, the INPUT_SIZE_TEST is (1024, 512). Why do you change it to (2048, 1024) when evaluating the self-distillation model? In my opinion, it may cause unfair comparison, since intuitively the larger the input size is, the better the performance will be. Also, as I know, most previous works use (1024,512) as the test input size.

Do I miss something? Thanks a lot. Looking forward to your reply!

GTA5 generate_gta5_label_info.py

In GTA5 dataset, the label is colored, RGB style. So I can't get the classID in GTA5 dataset, and how to translate GTA5 to Cityscape, make same object has same classID in both datasets. Do you how translate the RGB label to Gray label (I means the pixel value equals the classID)

VGG pre-trained model

Hi, thank you for your awesome work! Would you mind releasing the VGG pre-trained weights with and without the self-distillation step? By the way, the FADA mIoU 43.8 shown in the paper is the VGG model with the self-distillation step right?

Best regards,
Ut

comparison with previous art

Thanks for your work ! I am curious about the difference of this paper with previous methods for category-wise feature alignment in idea level.

For example, Chen et al. ICCV'17 proposed to align the feature distribution with global and class-wise discriminator. The loss defined at eqution 9) in their paper is quite similar with your proposed loss. Besides, Du et al. ICCV'19 also adapts similar idea without global alignment.

Despite of some different implementation details, could you explain the main differences between your proposed method and the mentioned two paper?

Do I need add torch.distributed.barrier() in test ?

do I need to add torch.distributed.barrier()
before https://github.com/JDAI-CV/FADA/blob/master/train_src.py#L233

Pre-trained models

Hi, thank you for your work and for releasing the code! It is quite interesting since it is considerably different from recent techniques. I was wondering if you can release the pretrained weights for FADA without the self-distillation step. It would be interesting to build from this step since Self-Distillation is a general strategy that can be applied to any model.
Thanks in advance.

why batch_size // 2 ?

cfg.SOLVER.BATCH_SIZE//2

Why do you use tgt_soft_label.detach() in Ladv?

thank you for sharing the code, I have a quaestio related with Ladv, why do you use the detach () in Ladv in this line:

FADA/train_adv.py

Line 186 in a1c6403

tgt_soft_label = tgt_soft_label.detach()

I ask this question because in training of semantic segmentation we need tgt_soft_label to fool the disc model?

SYNTHIA model evaluation

Hi! Thanks for sharing the work!

Because in the paper, the result is not reported. Want to make sure that, to evaluate the given SYNTHIA model without self distillation&MST (s2c_adv.pth) is configs/deeplabv2_r101_adv_synthia.yaml the correct yaml argument for running test.py? the result mIoU is 0.4091

Thanks.

Performance on Synthia -> Cityscapes

Hi, thank you for your great work. I'm trying to reproduce the results of Synthia -> Cityscapes, but the performance is not as good as yours. Here are 2 questions:

In your pretrained s2c_adv.pth, the mIoU is around 38% on the validation set. Did you use this checkpoint to generate pseudo labels for self distill training and achieve mIoU 45%?

I found that in the self distill config file of GTA->Cityscapes, you set the freeze batch norm to False while for the same step of Synthia -> Cityscapes, the setting of freeze batch norm is True, why do we need this change? Does it affect the final performance?

If possible, could you provide a script for reproducing the synthia->cityscapes result, just like train_with_sd.sh?

Question

I recently read this paper, and there is a problem I didn't figure out. How to calculate μi in the CCD formula? Is it to use source samples or source samples and target samples?

Performance on Cityscapes to Cross-City

Hi, thank you for your great work. I'm trying to reproduce the results of Cityscapes to Cross-City, but the performance is not as good as yours. Here are some questions:

In your paper (Table 1), for Cityscapes to Rio task, the mIoUs of Source DeepLab-v2 and AdaptSegNet are 48.2 and 51.6 respectively. But I got mIoU of 44.9 for Source DeepLab-v2 and mIoU of 47.0 for AdaptSegNet. Therfore, there is performance degradation for FADA and FADA w/ self distillation.

The config file of Source DeepLab-v2:

MODEL:
  NAME: "deeplab_resnet101"
  WEIGHTS: 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth'
  FREEZE_BN: True
  NUM_CLASSES: 19
DATASETS:
  SOURCE_TRAIN: "cityscapes_train"
  TEST: "crosscity_Rio_test"
INPUT:
  SOURCE_INPUT_SIZE_TRAIN: (1024, 512)
  TARGET_INPUT_SIZE_TRAIN: (1024, 512)
  INPUT_SIZE_TEST: (2048, 1024)
SOLVER:
  BASE_LR: 5e-4
  MAX_ITER: 31250
  STOP_ITER: 20000
  BATCH_SIZE: 8
  BATCH_SIZE_VAL: 1

And the config file of AdaptSegNet:

MODEL:
  NAME: "deeplab_resnet101"
  WEIGHTS: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth"
  FREEZE_BN: True
  NUM_CLASSES: 19
DATASETS:
  SOURCE_TRAIN: "cityscapes_train"
  TARGET_TRAIN: "crosscity_Rio_train"
  TEST: "crosscity_Rio_test"
INPUT:
  SOURCE_INPUT_SIZE_TRAIN: (1024, 512)
  TARGET_INPUT_SIZE_TRAIN: (1024, 512)
  INPUT_SIZE_TEST: (2048, 1024)
  BRIGHTNESS: 0.5
  CONTRAST: 0.5
  SATURATION: 0.5
  HUE: 0.2
SOLVER:
  BASE_LR: 2.5e-4
  BASE_LR_D: 1e-4
  MAX_ITER: 62500
  STOP_ITER: 40000
  BATCH_SIZE: 8

If possible, could you provide the config files and data loader for reproducing the results?

Thank you anyway.

About LICENSE

@krumo
Thanks for sharing excellent code!
I'm curious what the license for this code is.
Because I'm thinking of using it in competitions like Kaggle.

train custom dataset

Hi, how to train my own dataset without label_info.p, I have the images and labels for binary seg.

Any guidance will be so helpful!

About Cross-City?

Hi! Great work!
I have a quick question: when I conduct adaptation between real images from different cities, i.e., Cityscapes to Cross-City, but becaues the Cross-City only defines 13 major classes for annotations, should I set NUM_CLASSES=13 and adjust the id_to_trained in dataset or leave it unchanged and then report the results of the shared 13 classes?

Thanks in advance
binhui

About multi-scale testing?

Have you released the code to implement multi-scale testing? I cannot find the implementation of multi-scale testing in 'test.py'. Can you tell me how to achieve it or find it in your code?

Could not reproduce results for generate_synthia_label.py and the use of its result

I ran generate_synthia_label.py with both default settings (nproc=16) and nproc=1, but got output files varing greatly in size. Specifically, the file synthia_label_info.p you provided is around 1322KB, while my results are 458KB for nproc=16 and 1157KB for nproc=1. I am wondering what to blame for such a big difference.
Besides, would you be so kind to tell me about the functionality of such a step, as well as the performance boost of it?

Freeze discriminator parameters before performing adversarial loss backward

Should not the parameters of discriminators be frozen before applying loss_adv_tgt.backward()? The purpose here is to only adjust the parameters of the feature extractor to make the target feature more similar to the source feature. Therefore, I think the discriminator should be frozen before applying loss_adv_tgt.backward() in LN 193 in train_adv.py. file.

why don't you set one more label with name "others" for labels in cityscapes which do not exist in GTA5 or Sythia ?

why do you set one more label with name "others" for labels in cityscapes which do not exist in GTA5 or Sythia ?

Why do you test your distilled model at 2048, 1024 test size ?

I found that you set your test size is (2048, 1024) while your train size is (1024, 512) at distill phase. I recognized that if I set test size is (1024, 512), the performance would lose about 2%.

Unfair experimental settings and performance without training tricks?

Hi, thank you for your great work. I have some questions about the your experiment details.

In your experiments, your settings include large batchsize (8), distributed training, sync batch normalization and many data argumentation techniques (including changing brightness, contrast, saturation and hue). But in the other experiments you compare (ResNet101 part of Table 3), their training methods are mostly very ’plain‘, they mostly use small batchsize (1), single-card training, and barely data augmentation. The training tricks you used usually can bring huge boost to your final performance. I think a comparison in fair settings will show the real improvement of your method.

Why is self-distill-training better?

You trained cityscapes with self distillation mode. I found that flow was not different from train source mode. I didn't understand why it was better. Can you provide some fundaments or theories that can explain this problem ? Thanks