Giter Site home page Giter Site logo

logion's Introduction

README

Logion is a system for detecting errors in ancient and medieval Greek works. Here we document the training procedure for a premodern Greek BERT model, as well as how to utilize it for error detection. We provide BERT training code for those interested in replicating our training or using their own data; for those interested in using our premodern Greek BERT model out of the box, which was trained on over 70 million words of premodern Greek, we make this model available along with instructions to use it here: https://huggingface.co/cabrooks/LOGION-base.

This model can also be fine-tuned on specific works of interest to better suit a given task or trained from scratch (see train_example.py).

For details on using the code in this repo to generate error detection reports on works of your choice, follow the instructions in this document.

For more information about our group, see https://www.logionproject.princeton.edu.

Barbara Graziosi1, Johannes Haubold1, Charlie Cowen-Breen2, Creston Brooks3
1 Department of Classics, Princeton University [email protected]; [email protected]
2 Department of Pure Mathematics and Mathematical Statistics, University of Cambridge [email protected] / [email protected]
3 Department of Computer Science, Princeton University [email protected]

System requirements

It is recommended, but not required, that your system has a GPU in order to perform inference with Logion. On a system with Python >=3.8.8; Conda >=4.10.1, one can execute

>> conda create --name logion pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
>> conda activate logion

to initialize the environment.

System recommendations for training

Logion was trained on a research computing cluster with 2.8 GHz Intel Ice Lake nodes for several days. If you intend to fine-tune, it's recommended that your processor has at least 128 GB of memory and a GPU. With a Nvidia K80 / T4 (standard on Google Colab), beam search should take no more than 10 seconds for spans of up to 10 tokens, with current specifications.

logion's People

Contributors

charliecb avatar brooksca3 avatar

Stargazers

angelodel80 avatar  avatar Lucy  avatar  avatar Daniel Stoekl avatar Annette von Stockhausen avatar  avatar Stephen Sansom avatar

Watchers

 avatar  avatar Daniel Stoekl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.