Giter Site home page Giter Site logo

Comments (10)

danyaljj avatar danyaljj commented on September 8, 2024 2

While doing this, could you also check the human predictions (the last tab):
https://docs.google.com/spreadsheets/d/1wXStPurP6AamxvglOw0aOJ7V1DiSKczJNyFcyfDnk4w/edit?usp=sharing

See if there is anything that can be improved in the instructions in order to improve human annotators' understanding of each task.

from natural-instructions.

yeganehkordi avatar yeganehkordi commented on September 8, 2024

Sure, will do!

from natural-instructions.

danyaljj avatar danyaljj commented on September 8, 2024

In this negative example, should the answer be B or A?
https://github.com/allenai/natural-instructions/blob/master/tasks/task1393_superglue_copa_text_completion.json#L50-L54

Also, the language "So, the correct output should be "B"" is ambiguous (not clear if "B" is a good answer or not) -- need to improve it.

from natural-instructions.

danyaljj avatar danyaljj commented on September 8, 2024

Someone suggested that:

there are several tasks that mention "neural" (there was a choice of 3 options) but it seems that it actually wants to mean "neutral". (It's used more than once, so I'm not positive).

It would be great if we check the mentions of "neural" in the data.

from natural-instructions.

danyaljj avatar danyaljj commented on September 8, 2024

More feedback:

  • task033_winogrande_answer_generation.json
    • Was helpful but it needs to be clear what the target output is.
    • Indicate if two words are allowed, like "the store" or only a single word.
    • The is misspelled.
  • task034_winogrande_question_modification_object.json
    • Way too much reading in the instructions when it can mostly be boiled down to replace one word i.e. the 'trigger word' by its antonym (e.g. changing from 'small' to 'big'). Most everything else is explained in the examples.
  • task035_winogrande_question_modification_person.json
    • Possibly an example where the word changed isn't inappropriate or anything like that, it just doesn't flip the meaning well enough could be useful. Also it would be useful to know if the 70% overlapping words need to appear in the same order or just be present in the response.
    • The instructions don't address if we're allowed to use generic names at all, like Jane or John Doe, or if we should just stick with the PersonX and PersonY names. Also, are we allowed to change the context word.
    • I understood the instructions clearly and have a perception of what the task will entail. I however cannot confirm the instructions were good until I perform the task and see how it actually compares to my expectations from the instructions. <newline><newline>Often the instruction examples on these types of tasks do end up far more straight forward and easy compared to the actual tasks, so I'm aware of that possibility. For example, these examples all change a single word when the instructions mention a 70% overlap floor that suggests more than one word will need changing at times. Basically, more difficult/ambiguous examples would always help in these types of tasks. or maybe they are all one word swaps and smooth sailing. Guess it's time to find out...<newline><newline>After completing the task I would say that yes, a more complicated 'good' example than a single word change would help. And another note would be to put more emphasis on the context word and provide a 'bad' example that removes it. I had one where the context word was potentially a word that could be used as a trigger word and I could see somebody accidentally replacing it. (example 7)
    • The only examples that provide more than one highlighted response are labeled in the 'negative example' category. But it is not really clarified that you cannot give two responses for an output. Furthermore, the 'positive example' category only highlights one correct answer, but the adjoining explanation states that more than one is correct. So its not really defined that you can provide more than one output, which is confusing a bit, because on certain prompts there can definitely be more than one output.<newline>Also...I would like some guidance when there seems to be no valid output in the 2 sentences -- it's not mentioned.
  • task1152_bard_analogical_reasoning_causation.json
    • More examples and more explanation. Some people might be confused by the short instructions.
    • simple instructions, clear and concise examples. Best I've seen so far.
    • I think it's relatively straightforward and having both negative and positive examples helps.
  • task1153_bard_analogical_reasoning_affordance.json
    • The instructions are good and the explanations for the examples are great! I would expand on what exactly an affordance relation is, however.
  • task1154_bard_analogical_reasoning_travel.json
    • seems easy
    • Example 2:<newline>Input: airport : car. bermuda : ?<newline><newline>Output: bicycle<newline>you can travel around Bermuda by bicycle, though, so why is this not correct
  • task1155_bard_analogical_reasoning_trash_or_treasure.json
    • seems good as stated
    • Nothing
  • task1156_bard_analogical_reasoning_tools.json
    • I thought they were helpful especially with the wash/sink example for the negative/bad.
    • good instructions, seems clear, good examples<newline><newline>more "tricky" examples could be helpful though, for example I am really struggling with the "cut : glass" analogies below - the rest are pretty easy
    • The instructions are clear and the examples are helpful. Even the explanations, while basic, are pretty good in this. One of the better ones I've seen on this project so far. Still could probably use more negative examples.
    • It seemed weird that the same word "mop" was used again in one example.
  • task1157_bard_analogical_reasoning_rooms_for_containers.json
    • nothing I can think of. If you read it you should get it.
    • So this one is open ended? Confusing instructions. Way too cryptic.
  • task1158_bard_analogical_reasoning_manipulating_items.json
    • maybe use words instead of :
    • The instructions are relatively clear, but the examples and their explanations fail to remove any ambiguity. The examples in particular need to further explain the nature of the relationships being sought, rather than just saying this is right/wrong because object X can/cannot do that.
  • task1159_bard_analogical_reasoning_containers.json
    • I think that they explained the task to me very well.
    • The instruction in this one was really confusing to me, but the examples made perfect sense!
  • task1161_coda19_title_generation.json
    • Should be more detailed about how lengthy or in depth the title should be
  • task133_winowhy_reason_plausibility_detection.json
    • I would like to see more examples of bad outputs
    • I didn't see anything that should be corrected.
    • Make it clear that "correct" or "wrong" is all we need to write- not an explanation, not anything else.
  • task1342_amazon_us_reviews_title.json
    • This is just way too subjective.
    • The instructions are pretty clear except it could be more clear how long or short of an answer is preferred. I do not think anything needs improvement so far.
  • task1344_glue_entailment_classification.json
    • At first I thought the negatives were provided by AI and would be part of the task - as opposed to examples of doing the task wrong. The large box suggests that an explanation is required, but the instructions only ask for an output which, according to the above is only a 1 or 0
  • task1345_glue_qqp_question_paraprashing.json
    • It might help if, instead of input/output you say <newline>"question"<newline>"rephrase"
  • task1356_xlsum_title_generation.json
    • Says title should be short but doesn't explicitly say how short/long
    • These are really too long and difficult.
    • A few more examples would be helpful.
  • task1358_xlsum_title_generation.json
    • A few more examples couldn't hurt.
    • Hopefully this is worth my time, unlike your extremely confusing legalese hit I just returned after working halfway through it for free.
    • Says should be less than 20 words long but should also say how long it needs to be at minimum.
  • task1385_anli_r1_entailment.json
    • examples are quite helpful, the instruction is quite brief but examples make up for it.<newline>the unclear part: should I write explanations for my output?
    • I think in the positive examples you should just reconfirm what the right answer is. I know that's a given, but it helps if you state explicitly, "and so the correct answer is..." just to reinforce the correct interpretation.
    • For people who don't understand, the instructions should probably include what entailment and contradiction mean.
  • task1386_anli_r2_entailment.json
    • The first sentence says "th" instead of "the."
    • The instruction was pretty helpful and the examples was pretty helpful.
    • some of them seem to contradict each other in their logic
  • task1387_anli_r3_entailment.json
    • The instruction is so clear with the examples, no need to improve.
    • none
    • I had to google what entailment means. Better to just use deduction or implication.
    • I understand it somewhat but it's still cryptic.
  • task1388_cb_entailment.json
    • nothing
    • The extraneous utterances could have been eliminated from the dialogues.
    • The is misspelled in the instructions.
    • every time requesters ask for entailment-related questions I want to bang my head on the table. You never ever explain what that means.
    • Solid, concise instructions. Could've used more examples, but those that were included were explained well enough. Should've included a thorough explanation of the nature of each of the three relationships.
  • task1390_wscfixed_coreference.json
    • Instructions were clear and simple, and the examples were a mixed bag. Most of the positive explanations were good, although he explanation for example 3 basically just says this is correct because it is correct. Same with the negative explanations, they don't do anything beyond say the example is wrong.
  • task1391_winogrande_easy_answer_generation.json
    • I think they were fine, enough to determine what to do for the task.
    • The third word in the first sentence of the instructions is misspelled.
  • task1393_superglue_copa_text_completion.json
    • In the examples of bad outputs, example 2 has the correct output already, so it doesn't make sense at all to have it as a bad output!
    • The instructions seems so clear and examples was pretty helpful.
    • No issue in the instructions.
  • task1394_meta_woz_task_classification.json
    • It's good to me!
    • No improvements good instructions.
    • The word the is misspelled.
  • task1407_dart_question_generation.json
    • There needs to be a bit more about forming the questions; this is a pretty complex task, and would benefit from more examples and elaboration.<newline><newline>Also, this task is woefully underpaid, taking nearly 40 minutes, though a good 10 of that was reading and rereading the instructions.
    • Way too convoluted. If I'm reading it correctly, you basically take the inputs to form a sentence, but replace one of the inputs with a blank line to create a fill in the blank sentence with the triplets.
    • There has to be a more simple way to describe this. One of the examples contains a number and it's unclear if that's preferred or an error.
    • The RDF triplets are confusing in bad example 2. If possible, can you simplify the triplets. Thanks.
  • task1409_dart_text_generation.json
    • How are we supposed to figure out what any of the triples mean? Like you give negative examples but what does stuff like "preliminary ranks" even mean? how would I even look that up? It would be good if you gave the negative example and THEN corrected it so it's correct - so we can see how to change a wrong answer into a right answer. I also know NOTHIGN about sports, so trying to figure out what the triples even mean is impossible. What do those numbers mean? I don't know and it's unfair to expect that.
    • I thought the examples were pretty useful, but I might consider adding more--especially more negative examples.
    • the examples were really helpful in figuring out what we needed to write for the output
    • The instructions mention RDF triples, which makes it sound like a new concept has just been introduced but is in fact another name for just triples. Had to google to figure it out.
    • understandable
  • task1415_youtube_caption_corrections_grammar_correction.json
    • I would need to google what "stem-based," and "intra-word" to fully understand how to fix those errors. Simplify it. Should we remove sentence fragments from the end of the input stream, or keep them as is? It says digits need to be normalized, does that include years?
    • The sentences seem really awkward so it's a little confusing if the sentences need to be fully edited or not. The examples are way too simple compared to the actual task.
    • All things is very good.
  • task1439_doqa_cooking_isanswerable.json
    • Go into more details about the purpose of this study. Include a video guide tutorial. Simplify paragraph by separating text to make it more readable. (separate questions from answers) <newline>Remove toxic wording such as "garbage text" It comes off as unwelcoming and unprofessional.<newline>
    • understandable
  • task1442_doqa_movies_isanswerable.json
    • seems understandable
    • N/A
    • No need , the instruction is already pretty good.
    • none
  • task1516_imppres_naturallanguageinference.json
    • The instructions and examples are pretty helpful.
    • The example inputs aren't grammatical and don't make sense. The explanations make some sense for the positive ones. However I suggest that the examples just be called: correct and incorrect. The positive/negative makes it seem like we'll need to actually mark them negatively. I think instead it means that these are the types of work we want to avoid doing.<newline><newline>Also, example three of the bad ones is wrong. Both Premise and Hypothesis mention waitresses
  • task1529_scitail1.1_classification.json
    • nothing
    • I can't think of a way to improve them.
    • A bit more explanation or examples might help but still very useful.
    • No need everything is pretty good.
    • None
  • task1531_daily_dialog_type_classification.json
    • I would think a few more examples would help. Maybe coloring in of key words in the example to help determine the proper response that is sought.
    • The second negative example says it should be marked as a 3 instead of a 2, but 3 and 2 are not defined. If it means the third prediction type listed, that would be "question", which is incorrect. Negative example 3 has the same issue. Some examples of "unknown" would be very helpful as well.
    • having the definition of each answer type would have been good too
  • task1533_daily_dialog_formal_classification.json
    • both the instruction and examples are clear and helpful, because the task itself is complicated.
    • none
  • task1534_daily_dialog_question_classification.json
    • Instead of 1/0 it should be less confusing - 'yes' or 'no'. The examples below are also confusing because all of the examples above are only of SINGLE sentences, whereas the actual examples we're doing are multiple sentences in the first turn. Do I put "1" if it's two sentences and the first sentence is a question, like "Hi. How are you?" Is that a 1? Or would "How are you? Hi." be a 0 because it doesn't end in a question? Garbage.
  • task1540_parsed_pdfs_summarization.json
    • A lot of text and it took me a bit to see how you got the output. Highlighting keyphrases in the paragraph used to make the output in order to get how the process works, would be helpful.
    • Should give clearer instructions as to the length of the headlines to generate.
  • task1554_scitail_classification.json
    • At first I was confused on the undesirable output examples, I didn't understand at first that they were incorrect choices by workers. Maybe somehow stating that these choices would be wrong might make it more clear.
    • I think they are clear and make sense. Nothing to improve in my opinion.
    • The instruction is so clear and example are pretty helpful.
  • task1557_jfleg_answer_generation.json
    • None of the examples actually explain why something is grammatically correct lmao
    • The examples made this very clear, even if the paragraph instruction was a little confusing.
  • task1562_zest_text_modification.json
    • I would like to see more examples of what is right and wrong. Otherwise, that was a pretty good set of instructions for a task like this.
    • Typo in the first sentence. th should be the.
  • task1586_scifact_title_generation.json
    • nothing
    • none
    • Instructions should include a min/max for the number of words/characters sufficient to be called a title
  • task1598_nyc_long_text_generation.json
    • Highlight what must be included.
    • Probably include more example in future.
    • I'd add some color-coding to the text in the examples to show where they fit in. Also, what is the price? Is it per dish or for the average check?
  • task1612_sick_label_classification.json
    • Says that answers must be in the form of letters, when in fact they must be in the form of numbers 0, 1, or 2
  • task1615_sick_tclassify_b_relation_a.json
    • none
    • The 3 labels are not natural phrases and hard to remember. Putting them closer to the actual work would be helpful since we have to copy/paste them instead of using easier radio buttons.
    • The third word in the first sentence of the instructions is misspelled.
  • task1622_disfl_qa_text_modication.json
    • The bad examples are confusing
  • task1624_disfl_qa_question_yesno_classification.json
    • Solid instructions. However, somewhat unclean explanations since they really just restate this example is correct, or not correct, but in more words. They need some actual explanation as to why they are correct or incorrect.
    • A bit of confusing instructions feel a bit bogged down but i think i got it.
  • task1631_openpi_answer_generation.json
    • The instructions were fine although a bit minimal, and would've needed strong example explanations to make them clear. However, the output sentences in the example fail to follow the instruction requirements of generating grammatically correct output, and the explanations need to be more thorough.
  • task1640_aqa1.0_answerable_unanswerable_question_classification.json
    • The directions are clear and coherent
    • pretty good, could have more negative examples, or "tricky"/difficult ones<newline><newline>Some of the "questions" below are poorly formed, incomplete, don't really make sense, or are otherwise unclear... I said false in those cases
    • The word "the" is misspelled in the instructions.
    • Good instructions but structuring seems a bit of an issue why not have the question listed before the input.
  • task1659_title_generation.json
    • I wish there were a few more examples of what you are looking for.
    • none
    • Does not mention how long a title must be. Can it be multiple sentences? Can it just be a rephrasing of a single sentence in the summary?
    • A few more examples
  • task1664_winobias_text_generation.json
    • Make the task window larger. Better explain when "the" should be included. Does "his" count?
    • confusing because 'the' is not a coreference is the entire phrase 'the [receptionist/dog/whatever]'
    • I would maybe add in an example of a coreference word in the instructions. Though of course what they are is fully understood once you read the examples. The instructions also do not say how they should be formatted in the output ie numbered, separated by commas, etc. Again, it is shown in the examples that you want them separated by commas, but I believe that should be explicit in the instructions.
    • I don't know how to make it better actually, but the ambiguity level is fairly high with a lot of these sentences, so, I made a lot of assumptions based-on stereotypes and gut instinct.<newline><newline>I hope I was correct most of the time!
  • task1728_web_nlg_data_to_text.json
    • some of the tuple names make no sense, and asking people to use an outside source is ridiculous. don't do that. if I don't even know what I'm looking at, how am I supposed to figure out what it means?
    • I think some more examples would be helpful, especially examples that would be helpful understanding more confusing or less clear inputs.
    • Pay is horrendous here and should the output be a sentence?
  • task190_snli_classification.json
    • Everything was fine.
    • Example 19 has a typo in sentence one it has "noy" instead of "boy".
    • Please read th following instructions
    • The instruction and examples are pretty helpful.
    • The is misspelled in the instructions.
  • task199_mnli_classification.json
    • I am really confused by the last answer in the negative examples. If the two sentences agree with each other, why is that a "no" instead of a yes?
    • The instructions are solid, but the examples are sort of weak. The explanation for positive example 4 could use some further explanation. We definitely need more negative examples, as well.
    • could use a few more, those are not that great
  • task200_mnli_entailment_classification.json
    • Just to clarify, we are not supposed to input anything aside from the number as indicated by (Indicate your answer as 1,2, or 3 corresponding to the choice number of the selected sentence.)
    • The instruction was clear and examples are very helpful.
  • task201_mnli_neutral_classification.json
    • Everything looks good. More examples are always helpful! Extra explanations of why answers are right or wrong can be helpful too on the more confusing ones.
    • The examples for the most part were clear and understandable. Some may have been confusing but still defined what the examples below are looking for.
    • the instruction seems pretty clear and the examples are pretty helpful
    • In the instructions you have neural a couple of times instead of neutral
  • task202_mnli_contradiction_classification.json
    • letting the statements be listed one per line, such as<newline>1)<newline>2)<newline>3)<newline>would be easier to keep them separate.
    • Please read th following instructions
    • The word the is misspelled.
  • task219_rocstories_title_answer_generation.json
    • The written instruction was great. A lot of creativity is involved so the examples weren't as helpful, but I wouldn't remove them.
    • Typo right at the start : "Please read th following"<newline>in the examples to work on : "Joe listened to music well he cleaned"
    • none
  • task220_rocstories_title_classification.json
    • pretty clear, more 'tricky' or rare cases could be helpful
    • I thought the instructions were clear. The examples helped a lot too.
  • task226_english_language_answer_relevance_classification.json
    • The examples shown are very long to read and confusing
    • Really, just looking at the questions below (the actual task) helped more than the instructions. They just confused me. But I did understand I'm only to write "yes" or "no."
  • task232_iirc_link_number_classification.json
    • The instruction and examples was pretty clear. No need to improve.
    • very useful
  • task233_iirc_link_exists_classification.json
    • The instructions themselves would probably be okay, but the examples actually make the task more confusing, mostly due to unclear explanations of several of the examples. I'd expand upon the explanations for desirable example 2 and 3 as well as negative example 1
    • I would say slightly lengthier instructions as opposed to additional examples.
    • No need to improve the instruction.
    • NONE
  • task242_tweetqa_classification.json
    • none
    • The explanation of example 2 in the list of bad outputs says that the sample is labeled 'yes', but it is labeled 'no'.
    • The examples should be radio buttons.
    • Everything was fine.
    • Input: Context: Thank you to my loved ones and those that have been there for me, who have picked me up and helped me through everything. Oscar Pistorius (@OscarPistorius) August 8, 2014 Question: why does zendaya look gorgeous? Answer: because she speaks her mind.<newline><newline>Output: no<newline><newline>Explanation of the example: Here, the generated label should be 'no' because this context is not helpful in answering the question. Still this sample labeled as 'yes' which is wrong.<newline><newline>wrong
  • task249_enhanced_wsc_pronoun_disambiguation.json
    • More examples can be provided. Also, I feel that the example with the saucepan is a bit confusing.
    • The instruction is so clear, no issues here.
  • task281_points_of_correspondence.json
    • Set things up so you can easily copy and paste answer, include, pre-filled; Sentence 1, Sentence 2, etc., with blanks to fill in or space to fill in to add clarity to directions.<newline><newline>This batch is supremely underpaid- they take forever and do we really need to do this 20 times to get to the meat of the issues? Hourly is below minimum wage in almost every country, embarrassing.
    • It's pretty clear but maybe the instructions should state how we are to format the output. However, this question is answered in the examples, so maybe I'm just being picky
    • Adding sentence to the examples made it confusing. Just list it as 1. 2. 3.<newline>Also, the examples are bad and the task is confusing as a whole. Needs to be more clear and pay way better.
    • The instructions seem rather open to interpretation.
  • task288_gigaword_summarization.json
    • They're not really instructions... just examples. We have to learn by your example, which could lead to confusion for some. We're all only human out here (which is what you wanted - human data).
  • task290_tellmewhy_question_answerability.json
    • Nope everything was pretty good.
  • task304_numeric_fused_head_resolution.json
    • Everything was helpful.
    • a little confused when to use REFERENCE
    • I would like more bad examples of how you can confuse something referred to in the text.
  • task329_gap_classification.json
    • Please read th following instructions<newline>pronoun in the text is showed within
    • I think they explain the task pretty clearly.
  • task330_gap_answer_generation.json
    • The instructions are clear, if a bit short. The examples are helpful, although the explanations are basically saying "this answer is correct/incorrect" without really expanding on why, which could help. That being said, the task is pretty simple so it's not totally necessary to have amazingly detailed examples
    • Its perfect
  • task349_squad2.0_answerable_unanswerable_question_classification.json
    • Please read th following instructions
    • none
    • No need, everything is pretty good here.
    • There should be an option to label it neutral if the inputs are nonsensical. Or at least instructions on what to do if an input doesn't make sense.
  • task362_spolin_yesand_prompt_response_sub_classification.json
    • This is perfectly explained doesn't seem like any more needs to be there i was a bit confused but after going over the instructions it makes sense.
    • The example one seems a bit convoluted in relation to the prompt. A bit confusing there.
    • I get the task, but the instructions are nearly meaningless, for "Yes, and".
    • There is an unnecessary "1" in the instructions before "In short".
  • task391_causal_relationship.json
    • nothing needs improvement
    • This one makes sense.
  • task392_inverse_causal_relationship.json
    • Please read th following instructions
  • task393_plausible_result_generation.json
    • looks good to me
  • task401_numeric_fused_head_reference.json
    • the instruction was plain, simple and understandable
    • There's simply not enough instructions, and the grammar is poor in some of them as well (positive example explanation two, for instance)
    • instructions and examples are clear and helpful<newline>not sure if the response has to be a single word
  • task402_grailqa_paraphrase_generation.json
    • "The museum director of the [Science Museum, London] is?" how is this a grammatically correct sentence? if even the examples are wrong idk how you expect people to give you the right answers
    • I have an issue with the first example: "Which apis includes the protocol of [JSON]?"<newline><newline>I'd probably mark this as non-fluent because the verb is singular and the subject is plural. It should be: "Which apis include the protocol of [JSON]?". <newline><newline>Perhaps you want us to be forgiving of errors like that though, it's hard to be sure.<newline><newline>The minor missing e in the "th" in the first line of the instructions doesn't affect my understanding of the task, so I don't really have a problem with that. Frequently we work with requestors who are not native English speakers (or just not great proofreaders). Minor mistakes in instructions are perfectly understandable.<newline><newline>One more thing to think about, as I went through the task, is exemplified in example 20. The original question is a bit non-sensical in relation to the answer. I didn't want to fix it because I wanted to maintain meaning, but the given answer doesn't really reflect the question. Perhaps something better would be:<newline><newline>What model year preceded the [2016 Chevy Spark]?<newline><newline>but, that does change the meaning.
    • instructions and examples are clear and helpful
  • task418_persent_title_generation.json
    • It might be helpful to provide more reasoning behind why a title is good. For example, should strong verbs be used?
    • instructions are clear, thank you
  • task442_com_qa_paraphrase_question_generation.json
    • These were impressive. So straight forward. I know exactly what the task is asking for
    • Examples should have been diverse.
    • Please read th following instructions
    • none
  • task500_scruples_anecdotes_title_generation.json
    • Seems straight forward but the first example feels a bit confusing because of the complexity of the situation. The second is very straightforward and easy to see the clash point
    • Don't put examples in the instructions. It says imagine it's a social media post, does this mean I can be free to write a title in extreme short hand, meme format, and whatever else I can think of? Be very specific as to how you want outputs formatted
  • task510_reddit_tifu_title_summarization.json
    • All good
    • Was good instructions, some of the titles were long through and in the lead in it says 7-12 words, ideally. A few titles felt long
    • I thought the instructions were quite clear and the examples were very helpful.
    • This HIT seems interesting. While the directions seem a little open to interpretation there is enough to feel confident on how to proceed.
  • task520_aquamuse_answer_given_in_passage.json
    • The instructions seems very helpful
    • How should bad questions or questions with ambiguous answers be handled? What if the question is only partially answered in the passage?
  • task569_recipe_nlg_text_generation.json
    • A few more examples
    • When there is not an amount (cup, oz) it says None. So, 1 None egg, doesn't really make sense.
  • task602_wikitext-103_answer_generation.json
    • Before giving examples explain what the task actually is.
    • It seems pretty cut and dry but I will do my best.
  • task613_politifact_text_generation.json
    • none
  • task614_glucose_cause_event_detection.json
    • Everything was fine
  • task619_ohsumed_abstract_title_generation.json
    • none
  • task620_ohsumed_medical_subject_headings_answer_generation.json
    • The instruction was very clear and the examples are pretty helpful.
    • Everything was great and helpful.
    • the instructions are pretty helpful. No need to improve the instruction.
  • task623_ohsumed_yes_no_answer_generation.json
    • It was fairly daunting and overwhelming to look at those instructions. Bullet points could have sufficed to simply say yes or no if the abstract and title go together.
    • The instruction and examples are pretty helpful. No issues with the instruction here.
    • No issues everything was pretty good.
  • task640_esnli_classification.json
    • Nothing
    • Needs more examples
  • task641_esnli_classification.json
    • instructions and examples are quite helpful. the requirement on response is strange though.
    • Very confusing!!! I get the jist, but it's confusing.
  • task642_esnli_classification.json
    • instruction and examples are helpful because the task itself is simple
    • Given Sentence 1 and Sentence 2, indicate your answer as yes when the two sentences clearly agree/disagree with each other or no when it can't be determined. ... this is so confusing (the clearly agree/disagree... t hese are completely opposite trains of thought). Just have three options here: agree, disagree, and can't determine.
    • none
  • task645_summarization.json
    • Provide both positive and negative outputs for the same input. Clarify how you want ambiguous outputs handled, like "The Province" below, which is the name of a newspaper.
    • You could properly punctuate/capitalize everything
    • The only thing I might add is to make clear that we're not supposed to give an explanation of the Output. It took me a second to figure that out. Mostly because the size of the text fields made it seem like there should be more text than what an Output phrase would generate.
    • The English is a little confusing but I was overall able to figure it out in the end.
  • task648_answer_generation.json
    • Some examples include multiple identical pronouns (eg her trainer/her boyfriend) and it's not clear if one should be chosen over the other<newline>"Generate the word in the sentence to which the pronoun in the input is referring" isn't very clear. The most obvious explanation is it's referring to the item a possessive pronoun is describing but not every pronoun in the examples or main sentences *is* possessive.
    • Maybe more detailed explanations after the examples.
    • nothing
    • Should say something like please include both the pronoun and the associated noun. It should be noted that the work below looks nothing like the associated examples above, making the examples useless.
  • task670_ambigqa_question_generation.json
    • I found it easy to follow what makes something good or not.
    • Are we just supposed to make up the restrictions ourselves? Listen, googling is simply not going to happen. You're never going to pay enough to make that worth it. I don't know that there were 2 types of Big Brother shows and I wouldn't even know how to google that information. This is just a bad task.
    • easy to understand (only typo I noticed is Texas, which is not a city)
  • task671_ambigqa_text_generation.json
    • good
    • none
  • task677_ollie_sentence_answer_generation.json
    • none
    • I think it would help to have bad examples corrected. I honestly have no idea what I'm supposed to do here. Example three is especially confusing because it says it contains none of the subjects but "fine" is in the sentence.
  • task743_eurlex_summarization.json
    • none
    • Instruction is very clear
  • task760_msr_sqa_long_text_generation.json
    • typos
    • Everything good.
    • Good instructions
  • task769_qed_summarization.json
    • none
  • task827_copa_commonsense_reasoning.json
    • No need to improve, the instruction is pretty good
    • Input/Output is rather mechanical. Why not Premise/Explanation?
    • give more examples of positive, negative and hard to tell answers
    • none
    • the instruction is pretty clear and the examples are very helpful. No issues.
  • task828_copa_commonsense_cause_effect.json
    • The instructions and examples was pretty helpful.
    • Not bad, as you explained, but if I suggest setting another negative example about what is not a cause in this context. And explain a little more about why positive examples correspond to causes or effects, as the case may be.
    • Seems understandable, but again, should be more clear. Maybe on the options where you select choices, it be a "choice selection box" instead of typing? Typing seems way too open ended.
  • task879_schema_guided_dstc8_classification.json
    • The instructions are clear enough but the verbiage "output" is confusing in this context. As I understand it, for the questions below, I need to write either "Yes" or "No". I'm not 100% sure, however. It would be simpler to just ask if there is a question being asked or not. Also, at the top, "Please read th following instructions" is missing an "e".
    • There is a typo in the first sentence "Please read th following instructions". The part about the output should be 0 or 1 is a little confusing since I don't see that later on in the task.
    • the instruction and examples are helpful because the task is simple
  • task880_schema_guided_dstc8_classification.json
    • examples are helpful, instruction is a bit vague
    • Real straight forward simple instructions maybe a key always on screen would be nice. Also i am having a bit of a hard time understanding the difference between offern and inform it seems like they are rather similiar.
    • none
    • More examples would be helpful especially regarding OFFER vs INFORM.
  • task890_gcwd_classification.json
    • Nothing
    • For the "neutral" stance more examples or more explicit instructions about what is neutral might help. I got, from the examples of bad outputs, that an input that doesn't mention/directly speak about global warming is neutral, but that wasn't explicitly stated. But, if it mentions global warming, or evidence of global warming, but doesn't refer to any causes at all - is that neutral? Or should I google the evidence of global warming in the input, see if it is commonly attributed to humans or not, and then answer based on that info? For example if it says "Rising CO2 levels create higher global temps" is that neutral, since it doesn't address what causes rising CO2 levels? Or agrees since a quick google search would show that manufacturing, transportation, etc. emit/cause CO2 to rise?
    • the instruction was so clear with the examples.
  • task891_gap_coreference_resolution.json
    • Give a simple explanation of exactly what the task involves at the very beginning, before any drop-down menus or examples are even mentioned. Also, in the initial instruction, make sure you mention that you're looking for single-word or 2-3 word responses. This prevents workers from looking at the task and immediately throwing it back into the work pool. People see a big text box with 20 individual questions and immediately think it's a writing-intensive task.<newline><newline>Finally, Example 7 on this page could have multiple correct responses. Example 9 is asking for a reference of **her** that is a part of a movie title, I believe. There is no reference to be found in this one. Example 18 could have "Medbouh" or "Rogers" as a correct response.
    • Please read th following instructions
    • Example 2 in the positive section is confusing. Connie is the sister of Gordon, so how could "her" be referring to Connie's marriage to Gordon?
  • task892_gap_reverse_coreference_resolution.json
    • 'The' in the first sentence is missing an e. It's better to group the listed pronouns by gender just to avoid any potential confusion. Some of the examples make no sense when the name is replaced with a pronoun.
    • The passage should be more comprehensive
    • negative examples are unclear because they don't provide correct answers
    • none
    • Does not say what to do if the passage doesn't definitively show a pronoun for the name.
  • task893_gap_fill_the_blank_coreference_resolution.json
    • The "given blank" could be more obvious. It is very small and I didn't even know what it was at first.
    • I think that this is pretty self explanatory, read the passage, place the correct pronoun
  • task935_defeasible_nli_atomic_classification.json
    • Fixing the grammar in the examples, like "PersonX then good activity"
    • No. It's fine. You need more than a 60 minute timer on this particular set of HITs in case people catch more than 1, because now I'm rushing. Also, example 14? I know you want me to p[ut strengenther, but I would trust a cop LESS so this is an opinion question tbh. Also, radio buttons instead of having to type or copy-paste those two words 20 times
    • Both examples of positive/good outputs don't really make sense to me.
    • The instruction seems pretty helpful and the examples are very helpful.
  • task936_defeasible_nli_snli_classification.json
    • good examples, makes sense, no errors detected
    • Again, confusing as heck!
  • task937_defeasible_nli_social_classification.json
    • the paragraph instruction was most helpful. the examples are good, but its a little confusing whether I should be writing another example of a strengthener/weakener, or just the two words themselves.
    • I think that for this one a few more good examples would be helpful
    • Straightforward and clear
  • task957_e2e_nlg_text_generation_generate.json
    • You may want to include at least one bad example which uses bad grammar.
    • The word the is misspelled in the instructions.
  • task970_sherliic_causal_relationship.json
    • more examples could be helpful, or using examples in more concrete terms
    • These are pretty straight forward, any confusion was cleared up by the examples
    • The instructions and examples seems so clear.
    • The instruction and examples are very clear.

from natural-instructions.

Palipoor avatar Palipoor commented on September 8, 2024

@yeganehkordi I will start from 200.

from natural-instructions.

yeganehkordi avatar yeganehkordi commented on September 8, 2024

@yeganehkordi I will start from 200.

Sounds good.

from natural-instructions.

danyaljj avatar danyaljj commented on September 8, 2024

In task1612_sick_label_classification.json someone output descriptive labels (entailment, etc) while we expected 0, 1, 2. We should improve the instructions so that there is no confusion on this:

In this task, you're given a pair of sentences, sentence 1 and sentence 2. Your job is to choose whether the two sentences clearly agree (entailment)/disagree (contradiction) with each other, or if this cannot be determined (neutral). Your answer must be in the form of the letters 0 (entailment), 1 (neutral), or 2(contradiction).

from natural-instructions.

danyaljj avatar danyaljj commented on September 8, 2024

@yeganehkordi @Palipoor am I correct to assume that this is addressed? Checking if we should close this ...

from natural-instructions.

yeganehkordi avatar yeganehkordi commented on September 8, 2024

Yes, we can close this issue.

from natural-instructions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.