Giter Site home page Giter Site logo

butiscapstone4fidelity's Introduction

Benchmarking LLM


About The Project

Our project introduces a metric designed to evaluate the quality of textual summaries. This metric is pivotal in fields like finance, where precise information synthesis is critical.

  • Quality Discrimination: Distinguishes effectively between superior and inferior summaries, ensuring clear differentiation in their factual accuracies.
  • Factual Accuracy Measurement: Detects and quantifies any factual deviations, assigning lower scores to less accurate summaries.
  • Detail-Oriented Assessment: Provides comprehensive evaluations, focusing on how well the summary captures the essence and details of the original text.

This metric is not merely a tool for evaluation; it's a step towards enhancing the integrity of information processing in sectors where factual accuracy is non-negotiable.

Framework

Named Entity Comparison: Extract and compare financial-related named entities in texts. Analyzes and visualizes named entity accuracy and presence in summaries versus original texts.

Sentence-Level-based Summary Checking: Applies LLMs to check the consistency between the summary and the original text sentence by sentence. Highlights and identifies inconsistencies between the summary and the original text for in-depth analysis.

Direcroty Tree

│   .gitignore
│   LICENSE.txt
│   README.md
│
├───config
│       config.sh
│       requirements.txt
│
├───data
│       10summary_with_result.csv
│       falsified_summary.csv
│       falsified_summary_level.csv
│       final_version_cropped_first1000.csv
│       final_version_withouttext.csv
│
├───doc
│   ├───About_Us
│   │       Team's Bio.pdf
│   │
│   ├───Academic Paper
│   │       5054_factuality_enhanced_language_m.pdf
│   │       Evaluating Factuality.pdf
│   │       Evaluating the Factual Consistency.pdf
│   │
│   ├───Project Description
│   │       Benchmarking LLM .pdf
│   │       CAPSTONE PROJECT PROPOSAL Fidelity Summarization Metrics.pdf
│   │
│   └───Report
│           Capstone Project Initial Due Diligence Report.pdf
│           F23_Fidelity_Benchmarking LLM_1st_report.pdf
│           F23_Fidelity_Benchmarking LLM_final_report.pdf
│           F23_Fidelity_BenchmarkLLM_poster.pdf
│           Project Proposal.pdf
│
├───res
│   │   10levels.svg
│   │   good_to_bad.svg
│   │   LLM_Assisted_Framework.jpg
│   │   NER_Framework.jpg
│   │
│   └───Baseline
│           Boxplot_for_Scores.png
│
├───samples
│       documents_extraction.ipynb
│       presentation.ipynb
│       summary_level_with_result.csv
│
├───src
│   │   Bart.py
│   │   PaLM.py
│   │   pipeline.py
│   │   summary_generation.py
│ 
│
└───test
    ├───Data_Pipeline
    └───Summary_Generation
            bart.ipynb
            llama2.ipynb
            PaLM2.ipynb
            test.py

Getting Started

Python Pytorch scikit-learn NumPy Pandas

Dependencies

python==3.10.0
ipython==8.15.0
nltk==3.8.1
numpy==1.24.3
openai==1.3.7
pandas==1.5.3
python-dotenv==1.0.0
rouge_score==0.1.2
scikit_learn==1.2.2
sentence_transformers==2.2.2
spacy==3.7.2
stanza==1.6.1

Configuration

Shell Script

1. Environment setup

Setup with python virtual environment

bash ./config/config.sh

Setup with conda

bash conda install --file ./config/requirements.txt

2. OpenAI API setup

import sys
sys.path.append('../src/')
import pipeline
os.environ['OPENAI_API_KEY'] = 'Your OpenAi API Key'

Usage

Jupyter Notebook

The data extraction process is in documents_extraction

You can also find the demo and result compare with baseline metrics in presentation.

Report

LaTeX

License

Generic badge Hits

Group Members

Cong Chen (cc4887)

Email Github

Longxiang Zhang (lz2869)

Email Github

Ruolan Lin (rl3312)

Email Github

Taichen Zhou (tz2555)

Email Github

Yichen Huang (yh3550) - Team Captain

Email Github LinkedIn

Fidelity Memtors

Lilli Ann Rowan, Indraneel Biswas, Michael Threlfall, and Diana Kulmizev

Instructor/CA

Adam Kelleher

butiscapstone4fidelity's People

Contributors

lucas66zhang avatar ruolan0806 avatar tzhou19 avatar yichuang25 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.