Evaluating GPT in Japanese Bar Examination: Insights and Limitations

Abstract

Large-scale language models like ChatGPT have been reported to exceed the accuracy of human experts in a wide range of tasks. Recent research reports that ChatGPT passed the Japanese National Medical Examination, confirming its high performance in Japanese. We evaluated the accuracy of GPT-3, GPT-4, and ChatGPT in the Japanese Bar Examination (the multiple-choice format section), focusing on Constitutional Law, Civil Law, and Criminal Law over the past five years. The results revealed that the current correct answer rate for these exams is only 30-40% (compared to the average pass rate of 70%), which is significantly low. This study went beyond just the correct answer rate, dissecting the necessary reasoning and knowledge for the responses, and examining the performance of large-scale language models from each perspective. The findings show that 1) large-scale language models possess extensive knowledge of many statutes, 2) they have a high correct answer rate for questions that require understanding of legal theories but not specific knowledge of law, and 3) they have a low correct answer rate for questions requiring knowledge of case law. The primary reason for their lower performance compared to the American Bar Examination is thought to be a lack of knowledge in Japanese law, especially in case law.

概要

ChatGPTなどの大規模言語モデルが，多岐にわたるタスクにおいて人間の専門家の精度を上回ると報告されている．とくに日本の医師国家試験にChatGPTが合格したという最近の研究報告からも，日本語についての高い性能が確認されている．本研究では，日本の司法試験（短答式）の憲法，民法，刑法それぞれ過去5年分を対象に，GPT-3, GPT-4およびChatGPTの精度を評価した．結果として，現段階では日本の司法試験に対する正答率が3〜4割と非常に低いことが明らかになった．本研究では，単なる正解率にとどまらず，回答に必要な知識，能力を分解し，それぞれの観点での大規模言語モデルの性能を検証した．その結果，1)大規模言語モデルは多くの条文の知識を有していること，2)特定の条文や判例の知識を必要としないが学説の理解を必要とする問題に関しては正解率が高いこと，3)判例の知識を必要とする問題に関しては正解率が低いこと，が示された．アメリカの司法試験と比較して性能が低い原因の大部分は，日本法の知識，とくに判例の知識の乏しさにあると考えられる．

bib (en)

@techreport{choi_et_al_2023_j_bar_exam_en,
  author = {Choi, Jungmin and Kasai, Jungo and Sakaguchi, Keisuke},
  title = {{Evaluating GPT in Japanese Bar Examination: Insights and Limitations}},
  url = {https://github.com/keisks/j_bar_exam},
  year = {2023},
  month = {12},
}

bib (jp)

@techreport{choi_et_al_2023_j_bar_exam_jp,
  author = {チェ,ジョンミン and 笠井,淳吾 and 坂口,慶祐},
  title = {{日本の司法試験を題材としたGPTモデルの評価}},
  url = {https://github.com/keisks/j_bar_exam},
  year = {2023},
  month = {12},
}

Paper PDF

https://jxiv.jst.go.jp/index.php/jxiv/preprint/view/559

keisks / j_bar_exam Goto Github PK

j_bar_exam's Introduction

Evaluating GPT in Japanese Bar Examination: Insights and Limitations

Abstract

概要

bib (en)

bib (jp)

Paper PDF

j_bar_exam's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent