We use the Visual Genome filtered data widely used in the Scene Graph community.
Please see the public repository of the paper Unbiased Scene Graph Generation repository on instructions to download this dataset. After downloading the dataset you should have the following 4 files:
VG_100K directory containing all the images
VG-SGG-with-attri.h5
VG-SGG-dicts-with-attri.json (Can be found in the same repository here)
image_data.json (Can be found in the same repository here)
Train Iterative Model
To enable faster model convergence, we pre-train DETR on Visual Genome. We replicate the DETR decoder weights three times, and initialize our models three decoders with it. For convenience, the pretrained weights (with the decoder replication) are made available here. To use these weights during training, simply use the MODEL.WEIGHTS <Path to downloaded checkpoint> flag in the training command.
MODEL.DETR.UNDERSAMPLE_PARAM should be specified as twice the desired β value. (β=0.75 use MODEL.DETR.UNDERSAMPLE_PARAM 1.5)
Note
If the code fails, try running it on a single GPU first in order to allow some preprocessed files to be generated.
(This is a one-time step.) Once the code runs succesfully on a single GPU, you can run it on multiple GPUs as well. Additionally, the code, by default, is configured to run on 4 GPUs with a batch size of 12. If you run out of memory, change the batch size by using the flag SOLVER.IMS_PER_BATCH <NUM IMAGES IN BATCH>.
@inproceedings{Yu2022CoCaCC,
title = {Iterative Scene Graph Generation},
author = {Siddhesh Khandelwal and Leonid Sigal},
year = {NeurIPS 2022}
}
CenterMask
@inproceedings{lee2020centermask,
title = {enterMask: Real-Time Anchor-Free Instance Segmentation},
author = {Lee, Youngwan and Park, Jongyoul},
year = {CVPR 2020}
}