Comments (3)
I found another small incompatibility. Original CLEVR key for the type of functional program was called "function", while the GitHub key is "type". So, I wrote a small conversion script:
import argparse, os, json, random
from collections import defaultdict
parser = argparse.ArgumentParser()
parser.add_argument('--input_questions_file', default='../clevr-output/CLEVR_questions.json')
parser.add_argument('--output_questions_file', default='../clevr-output/CLEVR_fixed_questions.json')
def main(args):
# Load questions
with open(args.input_questions_file, 'r') as f:
question_data = json.load(f)
info = question_data['info']
questions = question_data['questions']
print('Read %d questions from disk' % len(questions))
# Rename 'type' to 'program'
for q in questions:
programs = q['program']
for p in programs:
p['function'] = p.pop('type')
# Dump new dict
with open(args.output_questions_file, 'w') as f:
print('Writing output to %s' % args.output_questions_file)
json.dump({
'info': info,
'questions': questions,
}, f)
if __name__ == '__main__':
main(parser.parse_args())
from clevr-dataset-gen.
The question generation script here on GitHub is mostly the same as the code used to generate CLEVR -- mostly I added documentation, tried to remove dead code that wasn't being used anymore, and changed the names of some of the JSON keys to have better names.
Here's the original generation code for you to compare against in case there are other differences that I can't remember:
https://gist.github.com/jcjohnson/6fb119a0372166ec9f4f006a1242a7bc
In the original code (L710) "template_idx" is also the index of a template within a file, much like "question_family_index" in the GitHub version of the file.
There was another script that converted the output from the original generation script into the format that we released as CLEVR_v1.0, which changed the names of JSON keys ("text_question" -> "question", "structured_question" -> "program"). Unfortunately after digging around today I wasn't able to find this conversion script.
However I suspect that the conversion script also changed the semantics of "template_idx" / "question_family_index" to be an overall index of the template (between 0 and 89) rather than the index of the template within the file; in hindsight this was clearly a mistake since it makes it tough to figure out which template was used to generate which question.
Thankfully the templates originally used for question generation have exactly the same structure as the ones on GitHub, so the only source of nondeterminism is the order that the JSON files are loaded (since this order depends on os.listdir
, which I think can give different orders on different filesystems).
To fix this issue, I manually matched up values of "question_family_index" from the released CLEVR_v1.0 data to the text templates from the JSON files, and found that you can recover the template for each question if you load them in this order:
- compare_integer.json
- comparison.json
- three_hop.json
- single_and.json
- same_relate.json
- single_or.json
- one_hop.json
- two_hop.json
- zero_hop.json
Here's a little script that shows how to recover templates from the released questions: It loads templates in this order, randomly samples some questions, and prints out the text of the question as well as it's recovered template:
https://gist.github.com/jcjohnson/9f3173703f8578db787345d0ce61002a
In the process of figuring this out, I realized another slight inconsistency between the original code and the GitHub code: we changed the wording of the "same_relate" templates to be less ambiguous (in particular adding "other" or "another"), but the semantics of these templates are exactly the same. Here are the old versions of those templates:
https://gist.github.com/jcjohnson/09541f3bcb32e73e0ba47c57d09f3f6e
from clevr-dataset-gen.
Thanks a lot, @jcjohnson !
Regarding the inconsistency between the original code and the Github code: can you please clarify which code was used to generate the widely used CLEVR distribution? I just checked, and found that CLEVR_val_questions.json
does contain questions of the form "What size is the other ...", meaning that it was probably the newer version of your templates that was used to generate it, not the one you linked as a Github gist. Can you please clarify?
from clevr-dataset-gen.
Related Issues (20)
- No key 'function' in the list of questions HOT 1
- image rendering issue HOT 17
- generate_questions produces a dataset that is more balanced than the original one HOT 4
- Inconsistency in 3d coordinates HOT 1
- degenerate questions
- Blender compatible version HOT 1
- Cannot run render_images.py on Blender 2.81 HOT 11
- Duplicate question templates in comparison.json
- Issues downloading the generated dataset
- /python/lib/python3.7/site-packages/clevr.pth: Read-only file system HOT 6
- Image rendering fails in the function check_visibility() HOT 1
- Error after around 700 images rendered HOT 4
- Link for Clevr data generation using Colab
- Add different shapes
- Get rendered image data without writing to disk
- CLEVR with other background HOT 1
- Deciding a right of a given Object
- Using different materials for the object
- Object rotation angle problem
- Zeroeth template in compare_integer.json can never return true
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clevr-dataset-gen.