Giter Site home page Giter Site logo

otalign's Introduction

OTAlign: Optimal Transport based Monolingual Word Alignment

Alignment example by OTAlign

Prerequisite

  • See src/requirements.txt
  • Please collect word alignment datasets: MultiMWA, Edinburgh++, MSR-RTE
    • Place them in a data/ directory
    • Preprocessing codes for Edinburgh++ and MSR-RTE are in src/preprocess

Unsupervised Word Alignment

For details, please refer to the arguments in src/unsupervised_alignment.py

UN_OUTDIR=../out/unsupervised/
SEED=42
DATA=mtref
OT=uot
WT=uniform
DT=cos
$ python unsupervised_alignment.py --data $DATA --sure_and_possible --model bert-base-uncased --centering --pair_encode --layer -3 --out $UN_OUTDIR --ot_type $OT --weight_type $WT --dist_type $DT --seed $SEED

Supervised Word Alignment

For details, please refer to the arguments in src/supervised_alignment.py

Note Supervised word alignment uses hyperparameters estimated in the unsupervised setting. You first need to run unsupervised word alignment.

SU_OUTDIR=../out/supervised/
BATCH=64
PATIENCE=5

$ python python supervised_alignment.py --batch $BATCH --out $SU_OUTDIR --data $DATA --sure_and_possible --model bert-base-uncased --ot_type $OT --weight_type $WT --dist_type $DT --seed $SEED --patience $PATIENCE --unsupervised_dir $SU_OUTDIR

Citation

Please cite our ACL2023 paper if you use this repository:

Yuki Arase, Han Bao, and Sho Yokoi. Unbalanced Optimal Transport for Unbalanced Word Alignment, in Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL 2023), (July 2023, to appear).

Contact

If you have any questions about codes in this repository, please contact Yuki Arase via email or simply post an issue ๐Ÿ’ฌ

otalign's People

Contributors

piyushi-0 avatar yukiar avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.