Benchmarking LLM

About The Project

Our project introduces a metric designed to evaluate the quality of textual summaries. This metric is pivotal in fields like finance, where precise information synthesis is critical.

Quality Discrimination: Distinguishes effectively between superior and inferior summaries, ensuring clear differentiation in their factual accuracies.
Factual Accuracy Measurement: Detects and quantifies any factual deviations, assigning lower scores to less accurate summaries.
Detail-Oriented Assessment: Provides comprehensive evaluations, focusing on how well the summary captures the essence and details of the original text.

This metric is not merely a tool for evaluation; it's a step towards enhancing the integrity of information processing in sectors where factual accuracy is non-negotiable.

Framework

Named Entity Comparison: Extract and compare financial-related named entities in texts. Analyzes and visualizes named entity accuracy and presence in summaries versus original texts.

Sentence-Level-based Summary Checking: Applies LLMs to check the consistency between the summary and the original text sentence by sentence. Highlights and identifies inconsistencies between the summary and the original text for in-depth analysis.

Direcroty Tree

│   .gitignore
│   LICENSE.txt
│   README.md
│
├───config
│       config.sh
│       requirements.txt
│
├───data
│       10summary_with_result.csv
│       falsified_summary.csv
│       falsified_summary_level.csv
│       final_version_cropped_first1000.csv
│       final_version_withouttext.csv
│
├───doc
│   ├───About_Us
│   │       Team's Bio.pdf
│   │
│   ├───Academic Paper
│   │       5054_factuality_enhanced_language_m.pdf
│   │       Evaluating Factuality.pdf
│   │       Evaluating the Factual Consistency.pdf
│   │
│   ├───Project Description
│   │       Benchmarking LLM .pdf
│   │       CAPSTONE PROJECT PROPOSAL Fidelity Summarization Metrics.pdf
│   │
│   └───Report
│           Capstone Project Initial Due Diligence Report.pdf
│           F23_Fidelity_Benchmarking LLM_1st_report.pdf
│           F23_Fidelity_Benchmarking LLM_final_report.pdf
│           F23_Fidelity_BenchmarkLLM_poster.pdf
│           Project Proposal.pdf
│
├───res
│   │   10levels.svg
│   │   good_to_bad.svg
│   │   LLM_Assisted_Framework.jpg
│   │   NER_Framework.jpg
│   │
│   └───Baseline
│           Boxplot_for_Scores.png
│
├───samples
│       documents_extraction.ipynb
│       presentation.ipynb
│       summary_level_with_result.csv
│
├───src
│   │   Bart.py
│   │   PaLM.py
│   │   pipeline.py
│   │   summary_generation.py
│ 
│
└───test
    ├───Data_Pipeline
    └───Summary_Generation
            bart.ipynb
            llama2.ipynb
            PaLM2.ipynb
            test.py

Getting Started

Dependencies

python==3.10.0
ipython==8.15.0
nltk==3.8.1
numpy==1.24.3
openai==1.3.7
pandas==1.5.3
python-dotenv==1.0.0
rouge_score==0.1.2
scikit_learn==1.2.2
sentence_transformers==2.2.2
spacy==3.7.2
stanza==1.6.1

Configuration

1. Environment setup

Setup with python virtual environment

bash ./config/config.sh

Setup with conda

bash conda install --file ./config/requirements.txt

2. OpenAI API setup

import sys
sys.path.append('../src/')
import pipeline
os.environ['OPENAI_API_KEY'] = 'Your OpenAi API Key'

Usage

The data extraction process is in documents_extraction

You can also find the demo and result compare with baseline metrics in presentation.

Report

Initial Due Deiligence Report: Initial Due Deiligence Report
Project Proposal: Project Proposal
1st Milestone Report: 1st Milestone Report
Final Report: Final Report
Poster: Poster

License

About us

Group Members

Cong Chen (cc4887)

Longxiang Zhang (lz2869)

Ruolan Lin (rl3312)

Taichen Zhou (tz2555)

Yichen Huang (yh3550) - Team Captain

Fidelity Memtors

Lilli Ann Rowan, Indraneel Biswas, Michael Threlfall, and Diana Kulmizev

Instructor/CA

Adam Kelleher

lucas66zhang / butiscapstone4fidelity Goto Github PK

butiscapstone4fidelity's Introduction