Giter Site home page Giter Site logo

psycoy / mixeval Goto Github PK

View Code? Open in Web Editor NEW
175.0 1.0 23.0 9.04 MB

The official evaluation suite and dynamic data release for MixEval.

Home Page: https://mixeval.github.io/

Python 99.94% Shell 0.06%
benchmark benchmark-mixture benchmarking-framework benchmarking-suite evaluation evaluation-framework foundation-models large-language-model large-language-models large-multimodal-models llm-evaluation llm-evaluation-framework llm-inference mixeval

mixeval's Introduction

🏠 Homepage | πŸ† Leaderboard | πŸ“œ arXiv | πŸ€— HF Dataset | πŸ€— HF Paper | 𝕏 Twitter


Static Badge Twitter GitHub Repo stars Static Badge


Benchmark correlations (%) with Chatbot Arena Elo, against the total costs of evaluating a single GPT-3.5-Turbo-0125 model. MixEval and MixEval-Hard show the highest correlations with Arena Elo and Arena Elo (En) among leading benchmarks. We reference the crowdsourcing price for Amazon Mechanical Turk ($0.05 per vote) when estimating the cost of evaluating a single model on Chatbot Arena (approximately $2,936). Chatbot Arena is prohibitively expensive, while MixEval and MixEval-Hard are cheap and cost-effective alternatives. For more details, please refer to our paper.


⚑ News

[2024-06-29] Our evaluation suite now supports evaluating local checkpoints, check here for details!

[2024-06-29] Our evaluation suite now supports other apis for model parser, check here.

MixEval

We introduce MixEval, a ground-truth-based dynamic benchmark derived from off-the-shelf benchmark mixtures, which evaluates LLMs with a highly capable model ranking (i.e., 0.96 correlation with Chatbot Arena) while running locally and quickly (6% the time and cost of running MMLU), with its queries being stably and effortlessly updated every month to avoid contamination.

The MixEval consists of two benchmarks: MixEval and MixEval-Hard, both updated with our fast, stable pipeline periodically. Both of them contain two splits, i.e., free-form and multiple-choice. Their relationships are presented below:

 MixEval (dynamic)
    β”‚
    β”œβ”€β”€ MixEval
    β”‚   β”œβ”€β”€free-form.json
    β”‚   └──multiple-choice.json
    β”‚
    └── MixEval-Hard
        β”œβ”€β”€free-form.json
        └──multiple-choice.json

See our homepage and paper for more details!


Click-and-Go LLM Evaluation Suite

This repository hosts the evaluation code and dynamic data release for MixEval. The current dynamic benchmark version is displayed at the top of this page. We offer a reliable click-and-go evaluation suite compatible with both open-source and proprietary models, which includes model response generation and score computation. Additionally, this evaluation suite facilitates straightforward registration of custom models and benchmark data.

As demonstrated in the paper, traditional rule-based parsers exhibit significant instability and are prone to considerable errors. We employ either GPT-3.5-Turbo or open-source models as our model parser, which has been proven stable in our and this study.

ATTENTION❗ Feel free to use your own evaluation code to evaluate with MixEval data. We provide the guidelines here.


Quick Start

(Step 1) Clone repo and setup the environment:

git clone https://github.com/Psycoy/MixEval.git
cd MixEval
conda create -n MixEval python=3.11 --yes
conda activate MixEval
bash setup.sh

# setup done

(Step 2) Setup the OpenAI API key for model parser. Create .env file under root dir (MixEval/) and add the below line to it:

MODEL_PARSER_API=<your openai api key>

The values in Leaderboard use GPT-3.5-Turbo-0125 as the default model parser. Open-source model parsers will also be supported.

If you are using Azure or APIs for model parser, check here.

(Step 3) Run evaluation and get results. That's all!

python -m mix_eval.evaluate \
    --model_name gemma_11_7b_instruct \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --max_gpu_memory 5GiB \
    --output_dir mix_eval/data/model_responses/ \
    --api_parallel_num 20

If you want to evaluate models that are not included in mixeval.models.__init__, see here for the simple steps of new model registration.

This command will run both inference and score computation. If you want to run model inference only, check here; if you want to run score computation only, check here.

Model response files and scores will be saved to <output_folder>/<model_name>/<benchmark>/<version>/, and in this case, it's mix_eval/data/model_responses/gemma_11_7b_instruct/mixeval_hard/2024-06-01/. We take the overall score as the reported score in Leaderboard.

Check here if you are evaluating a local checkpoint.

ATTENTION❗ It's important to read the essential configurations here before running the evaluation.


Registering New Models

(Step 1) Add your model file to mixeval/models/ with name your_model_name.py and write the model class in it with the name Model_Class_Name.

  • Open-source chat models are inherited from mixeval.models.base.ChatModel (example file: llama_3_8b_instruct.py).
  • Open-source base models are inherited from mixeval.models.base.BaseModel (example file: llama_3_8b.py).
  • Proprietary models are inherited from mixeval.models.base_api.APIModelBase (example file: gpt_4_turbo_2024_04_09.py, add your api key in .env).
  • In most cases, all you need to do is write a simple model class with a single __init__ function. However, if your model needs more setup, e.g., it requires a different build_model() function, you should override the corresponding function or variable of the parent model.
  • The model file name should be the same with the name you pass to the @register_model() decorator on top of the model class.

(Step 2) Add your model to mixeval.models.__init__.AVAILABLE_MODELS.

  • The entry you add should be in the form of your_model_name: Model_Class_Name. See other models in AVAILABLE_MODELS as a reference.

Only Performing Model Inference

Sometimes you may want to do model inference without computing the scores. You can achieve this by setting the --inference_only flag when running the mix_eval.evaluate module:

python -m mix_eval.evaluate \
    --model_name gemma_11_7b_instruct \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --max_gpu_memory 5GiB \
    --output_folder mix_eval/data/model_responses/ \
    --inference_only

Model response files will be saved to <output_folder>/<model_name>/<benchmark>/<version>/, and in this example it's mix_eval/data/model_responses/gemma_11_7b_instruct/mixeval_hard/2024-06-01/.

Check here if you are evaluating a local checkpoint.

ATTENTION❗ It's important to read the essential configurations here before running the evaluation.

You can check whether the model response files are complete after running the inference:

python -m mix_eval.utils.check_eval_complete \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --chat_models_to_check \
    gpt_4o \
    llama_3_70b_instruct \
    claude_3_opus \
    --base_models_to_check \
    none \
    --model_response_dir mix_eval/data/model_responses/ \
    --out_path mix_eval/data/model_responses/eval_checks.log

The checking results will be written to --out_path; only problematic files will be recorded.


Only Computing Scores

If you want to separately compute the scores, you should

  1. Prepare your model response files. You can use either our evaluation suite (refer to here) or your own (refer to the example response file formats and protocols specified here).
  2. Run the score computation script:
    python -m mix_eval.compute_metrics \
        --benchmark mixeval_hard \
        --version 2024-06-01 \
        --model_response_dir mix_eval/data/model_responses/ \
        --api_parallel_num 20 \
        --models_to_eval \
        gemma_11_7b_instruct \
        gpt_4o \
        claude_3_opus
    

You should set the --api_parallel_num properly according to your OpenAI user tier to avoid rate limits. In general, if you are a Tier-5 user, you can set --api_parallel_num to 100 or more to parse results in 30 seconds.

If you are using Azure or APIs for model parser, check here.

If you are parsing base models' responses, set the --extract_base_model_response flag to only retain the meaningful part in models' response to get more stablized parsing results.

If you finished the model parsing some time ago and now want to display the model results again, add --compute_score_from_judged_file flag to avoid calling the model parser api again to save your budget. You have to make sure that there exists the parsed files with the name of judge_results_ff_model_judge_gpt-3.5-turbo-0125 and judge_results_mp_model_judge_gpt-3.5-turbo-0125 under the target model response folder, where gpt-3.5-turbo-0125 denotes the model parser name, ff denotes free-form, mp denotes multiple-choice.


What is MixEval?

See our homepage and paper for more details!

MixEval is an approach that bridges the gap between real-world user queries and efficient, reproducible evaluation by leveraging user queries mined from the web and matching them with similar queries from existing benchmarks. MixEval is also the proposed benchmark built with this approach.

MixEval-Hard is the hard version of MixEval, designed to enhance the benchmark's ability to distinguish strong models. It is sampled from MixEval based on model evaluation results, with a higher probability of selecting harder queries. To address distribution deviation, we introduce a rejective sampling process to ensure that the distribution of MixEval-Hard aligns with that of wild queries.

Dynamic evaluation is introduced to mitigate the contamination issue. We periodically update the data points in MixEval and MixEval-Hard using our fast, stable pipeline, which performs benchmark mixture with a different batch of wild queries from the same distribution, showing low variance (0.36 Std. on a 0-100 scale) and significant version difference (85% unique query ratio).


Why to Use MixEval Benchmarks?

MixEval offers five significant advantages for practitioners:

  • Accurate model ranking, demonstrated by a 0.96 correlation with Chatbot Arena1.
  • Fast, cheap and reproducible execution, requiring only 6% the time and cost of MMLU and with no dependence on human input.
  • Dynamic benchmarking enabled by low-effort and stable updating mechanism.
  • A comprehensive and less biased query distribution, as it bases queries on a large-scale web corpus.
  • A fair grading process, ensured by the ground-truth-based grading mechanism.

How Effective is MixEval as a Benchmark Mixture Approach?

MixEval is effective as

  • MixEval and MixEval-Hard achieve the highest correlation with Arena Elo and Arena Elo (En) among all benchmarks.
  • MixEval improves the correlation with Arena Elo and Arena Elo (En) across all its main benchmark splits.
  • MixEval outperforms both benchmark-level and uniform mixtures.
  • MixEval effectively maps real-world user queries to ground-truth-based benchmarks.

🦾 Contribute

Feel free to hit the ⭐star button or 🦾contribute! We review new issues and PRs regularly and will acknowledge your contributions!

We would like to extend our heartfelt gratitude to the following contributors for their exceptional commitment to this repository:

  • @RodriMora
  • @teknium1
  • @philschmid
  • @carstendraschner

πŸ“‘ Citation

If you found this repository useful, please consider πŸ“‘citing:

@article{ni2024mixeval,
  title={MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures},
  author={Ni, Jinjie and Xue, Fuzhao and Yue, Xiang and Deng, Yuntian and Shah, Mahir and Jain, Kabir and Neubig, Graham and You, Yang},
  journal={arXiv preprint arXiv:2406.06565},
  year={2024}
}

mixeval's People

Contributors

psycoy avatar rodrimora avatar teknium1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mixeval's Issues

Question about the paper

Hi all,

Thanks for the nice work!

I had some questions regarding the paper:

  1. Could you give me more details about the length constraint you used when selecting questions based on similarities?
  2. Is the evaluation data available anywhere (the correctness scores for mixEval for the specific models used in your paper)?

Thank you!

Weird random answer if API endpoint is not available

When you API is not specified correctly: e.g. the endpoint as judge responds:

Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again.'}}
Error in GPT_decode, retrying...

It is obvious that the benchmark should not complete as judge is not available. but in my case I get somehow a final response which looks like this:

{"problem_type": "multiple-choice", "context": "animal", "prompt": "What is the animal trying to accomplish?", "options": ["sand trap", "live long", "leave home", "feel pain", "eating"], "target": [1], "benchmark_name": "CommonsenseQA", "formated_input": "animal\nWhat is the animal trying to accomplish?\nA. sand trap\nB. live long\nC. leave home\nD. feel pain\nE. eating\nAnswer with the option letter from the given choices directly.", "id": "3", "response": "E. eating", 
"judge_response": null, 
"judge_option": "D"}

How can It be that despite we have a judge response nullthat we have a judge open D
This leads to the fact that somehow the benchmark runs through and is evaluatable despite the fact no judge was ever used.

If you want to reproduce fast:

  • cut off eval data to only 1-4 samples
  • specify wrong environment vars for api endpoint.
  • let it run for some time (in my case 30min vs 2min if api endpoint is correctly specified)

regards

Requesting new models?

Is there a way to get results added to the leaderboard? Or can we request models to be added?

I'm interested in seeing the following models:

  • Qwen2
  • Phi3

Examples for open-source model judges & parsers

Hey, thanks for your great work!

Could you please provide examples for how to run the benchmarks with an open-source alternative parser\judge to GPT 3.5?
The readme mentions that "Open-source model parsers are also supported." but I couldn't figure how exactly to set them with mix_eval.evaluate and if there are any specific settings required for running an open-source model.
Lastly, the paper mentions that "We will also provide an open-source model parser with its stability test to ensure
long-term reproducibility". If you could provide such an open-source model that is tested it will be amazing.

Thanks!

CUDA out of memory error

Hi!

I'm getting a "CUDA out of memory error" but I'm trying to benchmark a small model with 4x3090 (96GB VRAM):

Error:

Start to evaluate qwen_15_7b_chat's close_freeform_hard split.


Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:04<00:00,  1.03s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended freeform hard data.
Sorting data based on input length.
Finished evaluating qwen_15_7b_chat's close_freeform_hard split. Used 8.58 minutes.


Start to evaluate qwen_15_7b_chat's close_multichoice_hard split.


Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:03<00:00,  1.01it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading close-ended multichoice hard data.
Sorting data based on input length.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 252, in <module>
    raise e
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 243, in <module>
    eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 235, in eval
    _eval(args)
  File "/home/ubuntu/MixEval/mix_eval/evaluate.py", line 200, in _eval
    model.get_responses(batch, response_file)
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 212, in get_responses
    return self.get_closeended_responses(batch, response_file)
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 236, in get_closeended_responses
    responses = self.chunk_generate(
  File "/home/ubuntu/MixEval/mix_eval/models/base.py", line 164, in chunk_generate
    outputs = model.generate(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1736, in generate
    result = self._sample(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2375, in _sample
    outputs = self(
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/MixEval/.venv/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1163, in forward
    logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.63 GiB. GPU 3 has a total capacty of 23.68 GiB of which 7.94 GiB is free. Including non-PyTorch memory, this process has 15.74 GiB memory in use. Of the allocated memory 14.25 GiB is allocated by PyTorch, and 1.18 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The command i'm using:

python -m mix_eval.evaluate \
    --model_name qwen_15_7b_chat \
    --benchmark mixeval_hard \
    --version 2024-06-01 \
    --batch_size 20 \
    --max_gpu_memory 96GiB \
    --output_dir mix_eval/data/model_responses/ \
    --api_parallel_num 20

System:
EPYC 7402
512GB RAM
4x3090's
image

I believe it should be enough VRAM to bench?

(Non) Reproducible Experiment Results

Hi,

I tried to reproduce the experiment results on a A100 while using Azure open AI API with GPT-35-Turbo-1106 as judge:

for Mistral7B it was fine
for LLAMA8B it was: 0.39(mine) vs 0.46(yours). Do you have an Idea why it is this far off?
I tried also with Nous Research version of LLAMA3 8B instruct

I faced the same with some other models.
The system prompt had only very small influence when used

Can GGUF and EXL2 compatibility be added?

Hi!

I've been testing MixEval but it requires using the full precision models downloaded from HF. Due to hardware limitations in the selfhosted community quantized models are really popular, specially in the GGUF format (using llama.cpp, high compatibility and decent speed with layers offloaded to vram), and EXL2 (using exllamav2, great speed, requieres model fully loaded in vram).

Default SYSTEM_MESSAGE for Llama 3 Instruct is "You are a pirate chatbot who always responds in pirate speak!"

It seems like this seems would generate bad benchmark results?

https://github.com/Psycoy/MixEval/blob/main/mix_eval/models/llama_3_8b_instruct.py#L18C1-L18C160

        self.SYSTEM_MESSAGE = {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"} # set to None if no system message

Note this seems to be an issue for llama_3_70b_instruct.py and zephyr_7b_beta.py

Is this what was run to generate the scores on https://mixeval.github.io/#leaderboard ?!?!

Duplicates in benchmark data

Hi MixEval Team & @Psycoy ,

thanks for your repo and your work to improve open source LLM benchmarks!

Issue:
While testing I discovered the following: In (also my mode) response files for mixeval hard freeform there are duplicates in the tasks.

Possible Problem:
Is this intentional or is it due to sampling with set back which is not intended?

Assumption:
My feeling would be that you would like to have a certain overlap to LM Sys arena but have as many unique tasks as possible, right?

Sample from Repo:

{"problem_type": "free-form", "context": "Succession is now regulated by laws passed by the National Diet. The current law excludes women from the succession. A change to this law had been considered until Princess Kiko gave birth to a son. Until the birth of Prince Hisahito, son of Prince Akishino, on September 6, 2006, there was a potential succession problem, since Prince Akishino was the only male child to be born into the imperial family since 1965. Following the birth of Princess Aiko, there was public debate about amending the current Imperial Household Law to allow women to succeed to the throne. In January 2005, Prime Minister Junichiro Koizumi appointed a special panel composed of judges, university professors, and civil servants to study changes to the Imperial Household Law and to make recommendations to the government. The panel dealing with the succession issue recommended on October 25, 2005, amending the law to allow females of the male line of imperial descent to ascend the Japanese throne. On January 20, 2006, Prime Minister Junichiro Koizumi devoted part of his annual keynote speech to the controversy, pledging to submit a bill allowing women to ascend the throne to ensure that the succession continues in the future in a stable manner. Shortly after the announcement that Princess Kiko was pregnant with her third child, Koizumi suspended such plans. Her son, Prince Hisahito, is the third in line to the throne under the current law of succession. On January 3, 2007, Prime Minister Shinz\u014d Abe announced that he would drop the proposal to alter the Imperial Household Law.", "prompt": "How many months after a panel was formed did they make a decision to support a revision to the law?", "target": ["9", "9", "9", "9", "9", "10", "10", "9", "9"], "benchmark_name": "DROP", "formated_input": "Question: Succession is now regulated by laws passed by the National Diet. The current law excludes women from the succession. A change to this law had been considered until Princess Kiko gave birth to a son. Until the birth of Prince Hisahito, son of Prince Akishino, on September 6, 2006, there was a potential succession problem, since Prince Akishino was the only male child to be born into the imperial family since 1965. Following the birth of Princess Aiko, there was public debate about amending the current Imperial Household Law to allow women to succeed to the throne. In January 2005, Prime Minister Junichiro Koizumi appointed a special panel composed of judges, university professors, and civil servants to study changes to the Imperial Household Law and to make recommendations to the government. The panel dealing with the succession issue recommended on October 25, 2005, amending the law to allow females of the male line of imperial descent to ascend the Japanese throne. On January 20, 2006, Prime Minister Junichiro Koizumi devoted part of his annual keynote speech to the controversy, pledging to submit a bill allowing women to ascend the throne to ensure that the succession continues in the future in a stable manner. Shortly after the announcement that Princess Kiko was pregnant with her third child, Koizumi suspended such plans. Her son, Prince Hisahito, is the third in line to the throne under the current law of succession. On January 3, 2007, Prime Minister Shinz\u014d Abe announced that he would drop the proposal to alter the Imperial Household Law.\nHow many months after a panel was formed did they make a decision to support a revision to the law?\nAnswer the question shortly.", "id": "118", "response": "The panel made a decision to support a revision to the law **11 months** after it was formed."}

(line 15,16,17,18)

How do you see this?

Kind regards
Carsten

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.