glaciohound / vcml Goto Github PK

View Code? Open in Web Editor NEW

48.0 3.0 8.0 2.49 MB

PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019

Home Page: http://vcml.csail.mit.edu

License: MIT License

Python 99.67% Shell 0.33%

natural-language-understanding embedding-models visual-question-answering

vcml's Introduction

Visual Concept-Metaconcept Learning (VCML)

This is a Pytorch implementation for the VCML paper.

Publication

Visual Concept-Metaconcept Learning
Chi Han* Jiayuan Mao*, Chuang Gan, Joshua B. Tenenbaum, and Jiajun Wu
In Neural Information Processing Systems (NeurIPS) 2019
[Paper] [Project Page] [BibTex]

@inproceedings{Han2019Visual,
	title={{Visual Concept Metaconcept Learning}},
	author={Han, Chi and Mao, Jiayuan and Gan, Chuang and Tenenbaum, Joshua B. and Wu, Jiajun},
	booktitle={Advances in Neural Information Processing Systems (NIPS)},
	year={2019},
}

Prerequisites

Python 3.6
PyTorch 1.2, with NVIDIA CUDA Support
Jacinle and other required python packages specified by requirements.txt. See Getting started.

Getting started

Install Jacinle

To install, you should first clone the package, and add the bin path to your global PATH environment variable:

git clone https://github.com/vacancy/Jacinle --recursive
export PATH=<path_to_jacinle>/bin:$PATH

Clone repository and install other packages

Clone this repository:

git clone https://github.com/Glaciohound/VCML.git

Create a conda environment for VCML, and install the requirements:

conda create -n VCML python=3.6
conda activate VCML
conda install --file requirements.txt
pip install pytorch-transformers

The Dataset

Prepare the dataset.

We augmented both synthetic (CLEVR) and natural image (GQA, CUB-200) datasets with text-only metaconcept questions, and evaluated the performance of different models. We use programs to generate synthetic questions and answers, based on the ground-truth annotations of visual concepts.

To replicate the experiments, you will need to follow the instructions below:

First, prepare a directory {data_dir} for the dataset.

Second, download the CLEVR dataset, GQA dataset and CUB dataset, and move the image folders to path {data_dir}/CLEVR/raw/images, {data_dir}/GQA/raw/images, {data_dir}/CUB/raw/images, respectively. Alternatively, you may run command

sh ./scripts/dataset/download.sh {data_dir}

to automatically do this.

Fianlly run the command:

sh ./scripts/dataset/prepare.sh {data_dir}

this script will download our dataset-augmentation, and organize all the data into the following structure:

{data_dir}
├── CLEVR
│   ├── raw
│   │   ├── images
│   │   │   └── ...
│   │   ├── COPYRIGHT.txt
│   │   └── LICENSE.txt
│   └── augmentation
│       ├── questions
│       │   └── synonym_generalization
│       │   │   └── ...
│       │   └── samekind_generalization 
│       │   │   └── ...
│       │   └── ...
│       └── sceneGraphs.pkl
├── CUB
│   ├── raw
│   │   └── ...
│   └── augmentation
│       └── ...
└── GQA
    └── ...

The dataset directory is now all set. The {data_dir}/cub and {data_dir}/gqa is organized in a way similar to dataset/clevr, so some parts of the directory tree are omitted.

The {data_dir}/{name}/original generally contains the original data from the CLRVR, GQA or CUB sub-dataset, though slightly re-organized and tailored. The {data_dir}/{name}/augmentation/questions/metaconcept_questions directory contains the augmented metaconcept questions. These questions are generated according to the groundtruth metaconcept knowledge we extracted on the ontology of each dataset. To know more about the groudtruch knowledge we used, you may find it helpful to explore our {VCML}/knowledge directory.

(If you would like to run the MAC baseline, you can download the feature files from here. Note that in order to fine-tune the ResNet, the file contains only raw image, and you will need to modify the MAC code to contain a trainable ResNet module.)

About the augmented questions

As explained above, we augmented existing datasets with metaconcept reasoning questions, based on the ontology knowledge of th datasets. The follow image serves as an overview of the final dataset, combining visual reasoning questions (for visually-grounding Concepts) and text-only metaconcept questions (for learning Metaconcepts based on learned concepts).

On th one hand, visual quesitions provides visual cues for the concepts, which help the learning of Metaconcepts. On the other hand, Metaconcepts serves as abtract-level supervision, which may help the learning of visual concepts.

For further details, you are encouraged to check out the explanation in {VCML}/knowledge/.

Training and evaluation.

Before running, you will need to prepare a directory {log_dir} to store the results, (which should contain sufficient space if you wish to keep the checkpoints and debugging information). The path to this directory will be passed to --log_dir option when running.

All commands for reproducing the results in the paper are explained in {VCML}/scripts/commands.md. Here we show with examples of how to use them.

Note: Make sure you are connected to the Internet before training, as the code might download pretrained weights for some components at the beginning.

to run one single experiment

You may run the following command to run a "Concepts help synonym metaconcept generalization" on CLEVR dataset.

jac-crun <GPU-id> scripts/main.py --mode run-experiment --task CLEVR --model VCML --experiment synonym_generalization --num_parallel 1 --log_dir {log_dir} --data_dir {data_dir}

You may replace --log_dir {log_dir} with --silent if you do not want to save logging files or checkpoints.

You will see a detailed logging on console:

0:00:00.00 | => scripts/main.py --mode run-experiment --task clevr --model vcml --experiment
           |    synonym_generalization --name clevr_SynGen --num_parallel 1
0:00:00.01 | => Current Time is: xxxx-xx-xx xx:xx:xx
0:00:00.03 | => Printing Arguments
0:00:00.77 | => balance_classification    : False
           |    batch_size                : 10
           |    box_scale                 : 1024
           ...
           |    
0:00:00.79 | => Loading sceneGraphs
0:00:11.75 | -----> Loaded sceneGraphs from: ...
           |        SceneGraphs size: ...
0:00:11.92 | -----> Loading referential-expression dataset from ...
           |        SceneGraphs size: ...
0:00:12.59 | => Loading Word_Index instances
0:00:12.72 | -----> loading Word_Index: words
           |        from file ..., length=xx
0:00:13.31 | -----> loading Word_Index: concepts
           |        from file ..., length=xx
0:00:13.62 | -----> loading Word_Index: metaconcepts
           |        from file ..., length=xx
	   ...
0:00:14.44 | => Loading question dataset
0:00:14.73 | -----> Selecting test concepts: 
	   ...
0:00:14.44 | => Loading question dataset
0:00:14.73 | -----> Selecting test concepts: 
           ...

to run multiple parallel experiments

If you have multiple GPUs available on one machine, you may choose to run several experiments with different random-seeds with one command. This would be helpful if you wish to reduce and measure the varaince of results. You should specify the GPUs you are using on the machine after jac-crun, and specify the number of paralle experiments in the --num_parallel option. For example:

jac-crun 0,1,2,3 scripts/main.py --mode run-experiment --task CLEVR --model VCML --experiment synonym_generalization --num_parallel 4 --log_dir {log_dir} --data_dir {data_dir}

to use pre-trained checkpoints

You can add --pretrained argument to use our pretrained checkpoints. The checkpoints will be automatically downloaded and loaded.

For more details, you are encouraged to refer to {VCML}/scripts/commands.md for more comands and a better explanation of option usage. This file contains all the commands needed to run experiments.

Model Implementation

For details about how the VCML framework is implemented, you may be interested in codes in models/model/vcml_model.py and models/nn/framework. The implementation of denotational probabilities is in models/nn/framework/functional.py models/nn/framework/sub_functional.py.

The semantic parser for questions is adopted from Kexin Yi's implementation of NS-VQA. Codes for the semantic parser are in directory reason/

vcml's People

Stargazers

Watchers

Forkers

willforcv lilujunai erobic gary828 nashory yaojiebao lemonqc gomb0c

vcml's Issues

Access forbidden to http://vcml.csail.mit.edu/data/ckpt/CLEVR_reason.tgz

Hi team, have been trying to run your code using the example line:

jac-crun <GPU-id> scripts/main.py --mode run-experiment --task CLEVR --model VCML --experiment synonym_generalization --num_parallel 1 --log_dir {log_dir} --data_dir {data_dir}

But keep getting the error on:

--2021-03-18 14:21:33--  http://vcml.csail.mit.edu/data/ckpt/CLEVR_reason.tgz
Resolving vcml.csail.mit.edu (vcml.csail.mit.edu)... 128.52.128.175
Connecting to vcml.csail.mit.edu (vcml.csail.mit.edu)|128.52.128.175|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-03-18 14:21:34 ERROR 403: Forbidden.

I also tried adding the option '--pretrained' to the same effect. Would you be able to help out? Looks like it is trying to load the file from some MIT data directory, which does not have general public access (I also tried access this file in the browser from my laptop [not from the cloud instance where I am running the code] and get the same response that it is forbidden.)

Thanks!

"KeyError: 'program_parsed'" when running the program

Hi. I tried to test the pretrained model using the test data from CLEVR dataset but had a error 'KeyError: 'program_parsed''. The command I typed in is as follows.

jac-crun 0 scripts/main.py --mode run-experiment --task CLEVR --model VCML --experiment synonym_generalization --log_dir ../data/log --data_dir ../data/ --pretrained --in_epoch test

and the error message shown is as follows.

0:00:52.63 | => epoch 0
|
0:00:52.85 | -----> Testing
Traceback (most recent call last):
File "scripts/main.py", line 44, in
main()
File "scripts/main.py", line 34, in main
run_experiment.run(args)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/run_experiment.py", line 181, in run
processes[0]._target(*processes[0]._args)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/run_experiment.py", line 151, in ready_go
train(coach, args)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/utils/train.py", line 223, in train
run_epoch(coach, args, coach.epoch)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/utils/train.py", line 192, in run_epoch
True, False
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/utils/train.py", line 142, in any_epoch
inner()
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/utils/train.py", line 128, in inner
loss, outputs = run_batch(data, model, args)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/scripts/utils/train.py", line 82, in run_batch
losses, outputs, debugs, objects = model(data)
File "/home/dule/anaconda3/envs/VCML/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/models/model/vcml_model.py", line 87, in forward
program = data['program_parsed']
KeyError: 'program_parsed'

/home/dule/Desktop/NN/AAAI_21/nuero-symbolic_ai/VCML/models/model/vcml_model.py(87)forward()
85 program_encoded = data['program_encoded']
86 else:
---> 87 program = data['program_parsed']
88 program_encoded = data['program_parsed_encoded']
89

ipdb>

Any suggestions how to fix this issue? Thank you very much!

No metaconcept_questions

I download the dataset zip, but I can not find the metaconcept_question directory.

How to generate synthetic questions and answers, based on the ground-truth annotations of visual concepts?

The author said "We use programs to generate synthetic questions and answers, based on the ground-truth annotations of visual concepts", could you release the code.

Questions repetition in test datasets (augmented)

Hi team, I have noticed that there is a high repetition of questions in test datasets in augmented data. In particular, I am looking at synonym_generalization task, which reads data from /data_agumented/CLEVR/questions/synonym_generalization/i/, where data_augmented is the file I dowloaded from http://vcml.csail.mit.edu/data/dataset_augmentation.tgz as per the instruction. I can see the following:

File /questions/synonym_generalization/0/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:

       ['Is small a synonym of sphere?', 'Is shiny a synonym of sphere?',
       'Is shiny a synonym of small?', 'Is sphere a synonym of small?',
       'Is small a synonym of shiny?', 'Is sphere a synonym of shiny?',
       'Is small a synonym of small?', 'Is shiny a synonym of shiny?',
       'Is sphere a synonym of sphere?']

File /questions/synonym_generalization/1/test_questions.json consists of 60000 questions, which are the repetition of the following 9 questions:

       ['Is cube a synonym of cube?', 'Is metal a synonym of cube?',
       'Is ball a synonym of cube?', 'Is metal a synonym of ball?',
       'Is cube a synonym of ball?', 'Is ball a synonym of ball?',
       'Is metal a synonym of metal?', 'Is ball a synonym of metal?',
       'Is cube a synonym of metal?']

File /questions/synonym_generalization/2/test_questions.json consists of 60000 questions, which are the repetition of the following 1 question:

     ['Is metallic a synonym of metallic?']

File /questions/synonym_generalization/3/test_questions.json consists of 60000 questions, which are the repetition of the following 16 questions:

      ['Is metal a synonym of shiny?', 'Is shiny a synonym of shiny?',
       'Is ball a synonym of large?', 'Is metal a synonym of large?',
       'Is shiny a synonym of large?', 'Is ball a synonym of ball?',
       'Is ball a synonym of shiny?', 'Is large a synonym of shiny?',
       'Is large a synonym of large?', 'Is shiny a synonym of ball?',
       'Is large a synonym of ball?', 'Is metal a synonym of ball?',
       'Is metal a synonym of metal?', 'Is shiny a synonym of metal?',
       'Is ball a synonym of metal?', 'Is large a synonym of metal?']

Is this behaviour expected? I could not find any other difference between these questions apart from 'question_index'. Would really appreciate your help on this.

Parser always producing <END> when predicting synonym operation on test questions

Hi team, making it a separate thread from #1

I have been testing the parser separately (which is getting downloaded from http://vcml.csail.mit.edu/data/ckpt/CLEVR_reason.tgz and seems to be pre-trained as it parses well train questions), and while parser.tools.operation.records contains 'synonym' (and therefore should be able to recogninze this operation), it is always predicting '10' (which is < END > operation) when applied to the test questions which all contain the word 'synonym'. For example, the question
'Is sphere a synonym of shiny?'
gets parsed into

# [[{'operation': '<END>', 'argument': 'shiny'},
#   {'operation': '<END>', 'argument': 'sphere'},
#   {'operation': '<END>', 'argument': '<END>'},
#   {'operation': '<END>', 'argument': '<END>'},
#   {'operation': '<END>', 'argument': '<END>'}].

Maybe in fact the parser it is not pretrained on this concept?

Run on mine test data

I have to test the model for my own input images instead of downloading the whole dataset, what is the exact procedure(commands) I should follow.

Args object has no attribute 'cudas'

Hi,

I am trying to replicate your project, but when running your sample single-experiment command, I am getting the cudas attribute not found error. Could you please give me some help? Thank you!

Traceback (most recent call last):
File "scripts/main.py", line 38, in
main()
File "scripts/main.py", line 28, in main
run_experiment.run(args)
File "/home/VCML/scripts/run_experiment.py", line 164, in run
device = args.cudas[i * args.num_gpus // args.num_parallel]
AttributeError: 'Args' object has no attribute 'cudas'

/home/VCML/scripts/run_experiment.py(164)run()
162
163 for i in range(0, args.num_parallel):
--> 164 device = args.cudas[i * args.num_gpus // args.num_parallel]
165 p = ctx.Process(target=ready_go,
166 args=(args, i, message[i], control[i],