lupantech / scienceqa Goto Github PK
View Code? Open in Web Editor NEWData and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
License: MIT License
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
License: MIT License
Hi expert, is it by design?
For example, the picture is only some green plats in the small strays, but the real way to solve the problem counts on the language understanding.
"180":{ "question":"Which of the following was a dependent variable in this experiment?", "choices":[ "the temperature of the heating pad", "the number of days until a seed germinated" ], "answer":1, "hint":"The passage below describes an experiment. Read the passage and think about the variables that are described.\n\nKenneth wanted to grow cucumbers from seeds. He read that using a heating pad to heat up potting soil could help make seeds germinate, or sprout, faster. Kenneth wondered whether the temperature of the heating pad would affect how quickly the seeds germinated.\nKenneth prepared two potting trays, each made up of ten small pots of soil. He planted one cucumber seed in each small pot and arranged the potting trays near a sunny window. He set an electric heating pad to 75\u00b0F and placed it under one potting tray. He set a second heating pad to 85\u00b0F and placed it under the other potting tray. Kenneth observed the pots daily, and he counted the number of days it took until a seed germinated in each pot.\nHint: An independent variable is a variable whose effect you are investigating. A dependent variable is a variable that you measure.\nFigure: germinating plants in a potting tray.", "image":"image.png", "task":"closed choice", "grade":"grade6", "subject":"natural science", "topic":"science-and-engineering-practices", "category":"Designing experiments", "skill":"Identify independent and dependent variables", "lecture":"Experiments have variables, or parts that change. You can design an experiment to find out how one variable affects another variable. For example, imagine that you want to find out if fertilizer affects the number of tomatoes a tomato plant grows. To answer this question, you decide to set up two equal groups of tomato plants. Then, you add fertilizer to the soil of the plants in one group but not in the other group. Later, you measure the effect of the fertilizer by counting the number of tomatoes on each plant.\nIn this experiment, the amount of fertilizer added to the soil and the number of tomatoes were both variables.\nThe amount of fertilizer added to the soil was an independent variable because it was the variable whose effect you were investigating. This type of variable is called independent because its value does not depend on what happens after the experiment begins. Instead, you decided to give fertilizer to some plants and not to others.\nThe number of tomatoes was a dependent variable because it was the variable you were measuring. This type of variable is called dependent because its value can depend on what happens in the experiment.", "solution":"", "split":"test" },
Hello,
I am trying to finetune MiniGPT4 on a Student engagement dataset. It labels are image ids and captions about the image. I managed to get it to perform okay on the dataset. However, It would hallucinate or give answer outside of what I would wish for it to answer as MiniGPT-4 is used for Freeform open-ended VQA.
How did you limit the answer generation space to just the options you provided? The paper mentioned about using a linear classifier at the end to limit the output tokens to just those answer options?
Tony,
Thanks for your evaluation dataset. However, I found the evaluation results are different between the leaderboard and the original paper. You do not show their best two results in the leaderboards. You actually show the ablation results instead. Why?
Original paper link: https://arxiv.org/pdf/2302.00923v4.pdf
Hey, I think your work is very meaningful, and I found in the paper that you experimented on the UnifiedQA model, but this part of the code is not currently available in the repo.
Will you release the code next?
Hi, this is a great dataset. Thanks for the hard work.
Instead of subject-level result, I am interested in topic-level evaluation result such as biology, chemistry, etc, and thus I wonder:
Thanks.
Hi expert, I've looked at the problems.json, the lecture looks not reasonable, how the lecture is generated? for example: for qid 5, the lecture is People can use the engineering-design process to develop solutions to problems. One step in the process is testing if a potential solution meets the requirements of the design. How can you determine what a test can show? You need to figure out what was tested and what was measured.nImagine an engineer needs to design a bridge for a windy location. She wants to make sure the bridge will not move too much in high wind. So, she builds a smaller prototype, or model, of a bridge. Then, she exposes the prototype to high winds and measures how much the bridge moves.nFirst, identify what was tested. A test can examine one design, or it may compare multiple prototypes to each other. In the test described above, the engineer tested a prototype of a bridge in high wind.nThen, identify what the test measured. One of the criteria for the bridge was that it not move too much in high winds. The test measured how much the prototype bridge moved.nTests can show how well one or more designs meet the criteria. The test described above can show whether the bridge would move too much in high winds..",
However, the question is something about Gordon's test
Hey, awesome work! I wanted to make this more accessible by putting on the huggingface hub: https://huggingface.co/datasets/derek-thomas/ScienceQA
There were a lot of fields in the description card that I filled in as best as I could. Would you consider reviewing this and after it meets your expectations could you add a link on your github repo?
Thanks,
Derek
Hi, It would be great to add LLaMA-SciTune (developed on top of LLaVA architecture with scientific visual-language data) results to the leaderboard.
It seems that the link of Google Drive cannot be downloaded, is there a download link of one drive?
First of all, thank you for open source a very good dataset.
There is the above image on your official website, and I am a little confused about the "context" in the red box. I didn't find the relevant keys in the "problems.json" file you provided. Can you tell me which parts make up "context" ๏ผ
I would like to incorporate your dataset into my multimodal work. I would be very grateful if you could reply.
Thanks for your awesome work! It paves the way towards multimodal reasoning agents. I noticed that the questions are collected from IXL learning. Since IXL learning is a website, would you mind explaining in detail how do you get the data from it? And the details you process the crawled data?
Thanks in advance :)
Hello,
Thank you for this great project.
Could you add the results of Honeybee? (code: https://github.com/kakaobrain/honeybee)
Both results are based on the 13B models.
Thanks!
Hi, nice work!
My questions are what is the prompt for GPT-3 zero-shot setting and how to ensure that the output of the model conforms to the parsable form.
I believe the AWS train dataset is missing 24, but google drive had it.
It would be great if the Multimodal-CoT model (paper) could be added to the Leaderbord page.
Hi, I load the dataset using the following command:
data = datasets.load_dataset('derek-thomas/ScienceQA', 'test')
for sample in data['test']:
sample['image']
But the sample['image'] in the data is the format of a dictionary with keys of 'bytes' and 'path', which is not a PIL image. And I don't know how to process it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.