Giter Site home page Giter Site logo

case's Introduction

CASE

The models are not made public due to the double-blind reviewing process. For the final version, the repository will have the links to these models on huggingface.

In 'BERTPreTraining.py' the data path of a directory which contains the documents in .parquet format is expected. This code can be changed to suit ones requirements.

In all other documents the data path of a single csv file with the respective data is expected.

  • In BERTFineTuning.py the source column is 'post' as set in the custom dataset. The target column contains binary values (0 or 1) and the name of this column is the disorder it indicates.
  • In GemmaTraining.py the pre_training variable needs to be set to False for fine tuning. In the case of pre training the source column in the dataset is 'TEXT' while in the case of fine tuning this values is 'Post'. In the case of fine tuning, the target column is 'Generated Diagnosis Summary'. This script was adapted from https://colab.research.google.com/github/adithya-s-k/LLM-Alchemy-Chamber/blob/main/LLMs/Gemma/finetune-gemma.ipynb
  • In GemmaValidation.py the data path is set to the file generated by GemmaTraining.py after finetuning is done. The data file should have 'Predicted Diagnosis' column which is generated by the model and 'Generated Diagnosis Summary' column which is the reference summary obtained using the annotations from GPT-3.5. Further the BART Score repository needs to be cloned and their checkpoint needs to be downloaded to calculate the BART Score from https://github.com/neulab/BARTScore.

    The python environment can be constructed using the requirments.txt file

  • case's People

    Contributors

    mncssj4x avatar sarthakharne avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.