Giter Site home page Giter Site logo

fada's Introduction

Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation (ECCV 2020)

This is a pytorch implementation of FADA.

Prerequisites

  • Python 3.6
  • Pytorch 1.2.0
  • torchvision from master
  • yacs
  • matplotlib
  • GCC >= 4.9
  • OpenCV
  • CUDA >= 9.0

Step-by-step installation

conda create --name fada -y python=3.6
conda activate fada

# this installs the right pip and dependencies for the fresh python
conda install -y ipython pip

pip install ninja yacs cython matplotlib tqdm opencv-python imageio mmcv

# follow PyTorch installation in https://pytorch.org/get-started/locally/
# we give the instructions for CUDA 9.2
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=9.2 -c pytorch

Getting started

ln -s /path_to_gta5_dataset datasets/gta5
ln -s /path_to_synthia_dataset datasets/synthia
ln -s /path_to_cityscapes_dataset datasets/cityscapes
  • Generate the label statics file for GTA5 and SYNTHIA Datasets by running
python datasets/generate_gta5_label_info.py -d datasets/gta5 -o datasets/gta5/
python datasets/generate_synthia_label_info.py -d datasets/synthia -o datasets/synthia/

The data folder should be structured as follows:

├── datasets/
│   ├── cityscapes/     
|   |   ├── gtFine/
|   |   ├── leftImg8bit/
│   ├── gta5/
|   |   ├── images/
|   |   ├── labels/
|   |   ├── gtav_label_info.p
│   ├── synthia/
|   |   ├── RAND_CITYSCAPES/
|   |   ├── synthia_label_info.p
│   └── 			
...

Train

We provide the training script using 4 Tesla P40 GPUs. Note that when generating pseudo labels for self distillation, the link to the pseudo label directory should be updated here.

bash train_with_sd.sh

Evaluate

python test.py -cfg configs/deeplabv2_r101_tgt_self_distill.yaml resume g2c_sd.pth

Tip: For those who are interested in how performance change during the process of adversarial training, test.py also accepts directory as the input and the results will be stored in a csv file.

Pretrained weights

Our pretrained models for Synthia -> CityScapes task(s2c) and GTA5 -> CityScapes task(g2c) are available via Google Drive.

Visualization results

Visualization

Acknowledge

Some codes are adapted from maskrcnn-benchmark and semseg. We thank them for their excellent projects.

Citation

If you find this code useful please consider citing

@InProceedings{Haoran_2020_ECCV,
  author = {Wang, Haoran and Shen, Tong and Zhang, Wei and Duan, Lingyu and Mei, Tao},
  title = {Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation},
  booktitle = {The European Conference on Computer Vision (ECCV)},
  month = {August},
  year = {2020}
} 

fada's People

Contributors

krumo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fada's Issues

Problems when you save soft labels for self distill

Process of saving soft labels happen in test.py

mask = get_color_pallete(pred, "city")
mask_filename = name[0] if len(name[0].split("/"))<2 else name[0].split("/")[1]
mask.save(os.path.join(output_folder, mask_filename))

You saved them as PIL images with colors(If I understand right)

But when you loaded them in cityscapes_self_distill.py

label = np.array(Image.open(datafile["label"]),dtype=np.float32)

Then, you added ignore label

label_copy = self.ignore_label * np.ones(label.shape, dtype=np.float32)
for k, v in self.id_to_trainid.items():
label_copy[label == k] = v

for k in self.trainid2name.keys():

label_copy[label == k] = k

label = Image.fromarray(label_copy)

I think the images you loaded is not in the format with label id but color code

Please show me where I'm wrong, thanks.

How to implement the CCD in pytorch?

Hi,
I'm recently trying to implement CCD in pytorch on source and target training data. However, the "sky" in "g2c_sd.pth" usually gets very large intra-class distance (resnet-101). The following pseudo code is how I calculate the intra-class distance between features belong to the class i and the feature of class center i :

  1. tensor = x - mu_i
  2. tensor = tensor[label == i]
  3. intra-distance = ((torch.norm(tensor, p=1, dim=1)) ** 2 / 2048).sum() / number of pixels belongs to class i

where the shape of each variable is:
x: [bs, 2048, num of pixels belongs to class i]
mu_i: [1, 2048]
tensor: [bs, 2048, num of pixels belongs to class i]

I calculate the inter-class distance in a similar way: replace 1) into tensor = mu_i - mu_j.
Am I correct?

Best regards,
Ut

How do you use thresholds in your code?

Good job!However,I have some questions.
As mentioned in the paper, soft labels are also filtered by the threshold. In the code, I only see this:
tgt_soft_label[tgt_soft_label>0.9] = 0.9
Please explain this code.Why don't you filter out confidence lower than the threshold.

There is a bit of inconsistency between the code and the paper

Hi,I find there is a bit of inconsistency between the code and the paper.
In codes,adversarial loss is caculated by
loss_adv_tgt = 0.001*soft_label_cross_entropy(tgt_D_pred, torch.cat((tgt_soft_label, torch.zeros_like(tgt_soft_label)), dim=1))
in which you caculate the loss between target prediction and target labels.

However,in the paper,the adversarial loss is written like this:
image
Adverasarial loss is caculated between target label and source prediction.
right?

Why Test Input Size of self-distillation phase is 1024*2048? instead of 512*1024 as before?

@wyvernbai @krumo Firstly, thanks a lot for your great work. However, I have a question about the test input size when evaluating the self-distillation model. Because I have seen that you set INPUT_SIZE_TEST: (2048, 1024) in the fille configs/deeplabv2_r101_tgt_self_distill.yaml.

I am a little confused because when evaluating other models, the INPUT_SIZE_TEST is (1024, 512). Why do you change it to (2048, 1024) when evaluating the self-distillation model? In my opinion, it may cause unfair comparison, since intuitively the larger the input size is, the better the performance will be. Also, as I know, most previous works use (1024,512) as the test input size.

Do I miss something? Thanks a lot. Looking forward to your reply!

GTA5 generate_gta5_label_info.py

In GTA5 dataset, the label is colored, RGB style. So I can't get the classID in GTA5 dataset, and how to translate GTA5 to Cityscape, make same object has same classID in both datasets. Do you how translate the RGB label to Gray label (I means the pixel value equals the classID)

VGG pre-trained model

Hi, thank you for your awesome work! Would you mind releasing the VGG pre-trained weights with and without the self-distillation step? By the way, the FADA mIoU 43.8 shown in the paper is the VGG model with the self-distillation step right?

Best regards,
Ut

comparison with previous art

Thanks for your work ! I am curious about the difference of this paper with previous methods for category-wise feature alignment in idea level.

For example, Chen et al. ICCV'17 proposed to align the feature distribution with global and class-wise discriminator. The loss defined at eqution 9) in their paper is quite similar with your proposed loss. Besides, Du et al. ICCV'19 also adapts similar idea without global alignment.

Despite of some different implementation details, could you explain the main differences between your proposed method and the mentioned two paper?

Pre-trained models

Hi, thank you for your work and for releasing the code! It is quite interesting since it is considerably different from recent techniques. I was wondering if you can release the pretrained weights for FADA without the self-distillation step. It would be interesting to build from this step since Self-Distillation is a general strategy that can be applied to any model.
Thanks in advance.

Why do you use tgt_soft_label.detach() in Ladv?

thank you for sharing the code, I have a quaestio related with Ladv, why do you use the detach () in Ladv in this line:

tgt_soft_label = tgt_soft_label.detach()

I ask this question because in training of semantic segmentation we need tgt_soft_label to fool the disc model?

SYNTHIA model evaluation

Hi! Thanks for sharing the work!

Because in the paper, the result is not reported. Want to make sure that, to evaluate the given SYNTHIA model without self distillation&MST (s2c_adv.pth) is configs/deeplabv2_r101_adv_synthia.yaml the correct yaml argument for running test.py? the result mIoU is 0.4091

Thanks.

Performance on Synthia -> Cityscapes

Hi, thank you for your great work. I'm trying to reproduce the results of Synthia -> Cityscapes, but the performance is not as good as yours. Here are 2 questions:

In your pretrained s2c_adv.pth, the mIoU is around 38% on the validation set. Did you use this checkpoint to generate pseudo labels for self distill training and achieve mIoU 45%?

I found that in the self distill config file of GTA->Cityscapes, you set the freeze batch norm to False while for the same step of Synthia -> Cityscapes, the setting of freeze batch norm is True, why do we need this change? Does it affect the final performance?

If possible, could you provide a script for reproducing the synthia->cityscapes result, just like train_with_sd.sh?

Question

I recently read this paper, and there is a problem I didn't figure out. How to calculate μi in the CCD formula? Is it to use source samples or source samples and target samples?

Performance on Cityscapes to Cross-City

Hi, thank you for your great work. I'm trying to reproduce the results of Cityscapes to Cross-City, but the performance is not as good as yours. Here are some questions:

In your paper (Table 1), for Cityscapes to Rio task, the mIoUs of Source DeepLab-v2 and AdaptSegNet are 48.2 and 51.6 respectively. But I got mIoU of 44.9 for Source DeepLab-v2 and mIoU of 47.0 for AdaptSegNet. Therfore, there is performance degradation for FADA and FADA w/ self distillation.

The config file of Source DeepLab-v2:

MODEL:
  NAME: "deeplab_resnet101"
  WEIGHTS: 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth'
  FREEZE_BN: True
  NUM_CLASSES: 19
DATASETS:
  SOURCE_TRAIN: "cityscapes_train"
  TEST: "crosscity_Rio_test"
INPUT:
  SOURCE_INPUT_SIZE_TRAIN: (1024, 512)
  TARGET_INPUT_SIZE_TRAIN: (1024, 512)
  INPUT_SIZE_TEST: (2048, 1024)
SOLVER:
  BASE_LR: 5e-4
  MAX_ITER: 31250
  STOP_ITER: 20000
  BATCH_SIZE: 8
  BATCH_SIZE_VAL: 1

And the config file of AdaptSegNet:

MODEL:
  NAME: "deeplab_resnet101"
  WEIGHTS: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth"
  FREEZE_BN: True
  NUM_CLASSES: 19
DATASETS:
  SOURCE_TRAIN: "cityscapes_train"
  TARGET_TRAIN: "crosscity_Rio_train"
  TEST: "crosscity_Rio_test"
INPUT:
  SOURCE_INPUT_SIZE_TRAIN: (1024, 512)
  TARGET_INPUT_SIZE_TRAIN: (1024, 512)
  INPUT_SIZE_TEST: (2048, 1024)
  BRIGHTNESS: 0.5
  CONTRAST: 0.5
  SATURATION: 0.5
  HUE: 0.2
SOLVER:
  BASE_LR: 2.5e-4
  BASE_LR_D: 1e-4
  MAX_ITER: 62500
  STOP_ITER: 40000
  BATCH_SIZE: 8

If possible, could you provide the config files and data loader for reproducing the results?

Thank you anyway.

About LICENSE

@krumo
Thanks for sharing excellent code!
I'm curious what the license for this code is.
Because I'm thinking of using it in competitions like Kaggle.

train custom dataset

Hi, how to train my own dataset without label_info.p, I have the images and labels for binary seg.

Any guidance will be so helpful!

About Cross-City?

Hi! Great work!
I have a quick question: when I conduct adaptation between real images from different cities, i.e., Cityscapes to Cross-City, but becaues the Cross-City only defines 13 major classes for annotations, should I set NUM_CLASSES=13 and adjust the id_to_trained in dataset or leave it unchanged and then report the results of the shared 13 classes?

Thanks in advance
binhui

About multi-scale testing?

Have you released the code to implement multi-scale testing? I cannot find the implementation of multi-scale testing in 'test.py'. Can you tell me how to achieve it or find it in your code?

Could not reproduce results for generate_synthia_label.py and the use of its result

I ran generate_synthia_label.py with both default settings (nproc=16) and nproc=1, but got output files varing greatly in size. Specifically, the file synthia_label_info.p you provided is around 1322KB, while my results are 458KB for nproc=16 and 1157KB for nproc=1. I am wondering what to blame for such a big difference.
Besides, would you be so kind to tell me about the functionality of such a step, as well as the performance boost of it?

Freeze discriminator parameters before performing adversarial loss backward

Should not the parameters of discriminators be frozen before applying loss_adv_tgt.backward()? The purpose here is to only adjust the parameters of the feature extractor to make the target feature more similar to the source feature. Therefore, I think the discriminator should be frozen before applying loss_adv_tgt.backward() in LN 193 in train_adv.py. file.

Unfair experimental settings and performance without training tricks?

Hi, thank you for your great work. I have some questions about the your experiment details.

In your experiments, your settings include large batchsize (8), distributed training, sync batch normalization and many data argumentation techniques (including changing brightness, contrast, saturation and hue). But in the other experiments you compare (ResNet101 part of Table 3), their training methods are mostly very ’plain‘, they mostly use small batchsize (1), single-card training, and barely data augmentation. The training tricks you used usually can bring huge boost to your final performance. I think a comparison in fair settings will show the real improvement of your method.

Why is self-distill-training better?

You trained cityscapes with self distillation mode. I found that flow was not different from train source mode. I didn't understand why it was better. Can you provide some fundaments or theories that can explain this problem ? Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.