Giter Site home page Giter Site logo

zzg-971030 / clams Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aaronmueller/clams

0.0 0.0 0.0 5.44 MB

Syntactic evaluation sets, attribute-varying grammars, and code for replicating the CLAMS paper. ACL 2020.

License: Apache License 2.0

Shell 4.21% Python 95.79%

clams's Introduction

Cross-Linguistic Assessment of Models on Syntax

Source and instructions required for replicating the CLAMS paper (Cross-Linguistic Syntactic Evaluation of Word Prediction Models, Mueller et al., ACL 2020).

In this repository, we provide the following:

  • Our data sets for training and testing the LSTM language models
  • Syntactic evaluation sets
    • English, French, German, Hebrew, Russian
  • Attribute-varying grammars (AVGs)
  • Code for replicating the CLAMS paper, including:

Data Sets

To replicate the multilingual corpora, simply concatenate the training, validation, and test corpora for each language. The multilingual vocabulary is the concatenation of each language's monolingual vocabulary.

Attribute-Varying Grammars

These are used to generate syntactic evaluation sets by varying attributes. This generates sets of grammatical and ungrammatical examples in a controlled manner.

The behavior of this system is defined in grammar.py. The idea is quite similar to context-free grammars, but with an added vary statement which defines which preterminals and attributes to vary to generate the desired incorrect examples. See the CLAMS paper for more detail.

The generation procedure we use is defined in generator.py. We give the script a directory of grammars, wherein each file contains one syntactic test case. We also define a common.avg grammar for each language, which contains terminals shared by all other grammars in the directory. You can also check whether all tokens in your grammars are contained in your language model's vocabulary by using the --check_vocab argument, which takes a text file of line-separated tokens.

Example usage:

python generator.py --grammars fr_replication --check_vocab data/fr/vocab.txt

Syntactic Evaluation Sets

The evaluation sets we use in the CLAMS paper are present in the *_replication_evalset folders. They are formatted as tab-separated tables, where the first column is a Boolean representing the grammaticality of the sentence, and the second is the evaluation case.

Note that the AVGs generate examples which have a minimal amount of preprocessing---most tokens are lowercase, and by default, they do not contain punctuation or end-of-sentence markers. This is meant to keep them modular. We provide a preproc.py script which changes the format of the examples to better fit our training domain, and this should be modified to make the evaluation sets look more like your training sets (if you so choose). We use the --eos setting to obtain the results in Table 2 of our paper; we use both the --eos and --capitalize settings to obtain the results in Table 4. The postproc.sh script simply renames the files generated by preproc.py to replace the original un-processed files.

Example usage:

python preproc.py --evalsets fr_replication_evalset --eos
./postproc.sh fr_replication_evalset

Language Model Training and Evaluation

Training and Testing LMs

Requirements:

  • Python 3.6.9+
  • PyTorch 1.1.0
  • CUDA 9.0

We modify the code of van Schijndel, Mueller & Linzen (2019), which itself is a modification of the code from Marvin & Linzen (2018). This code was written to run on a particular SLURM-enabled grid setup. We highly encourage pull requests containing code updated to run on more recent PyTorch/CUDA versions, as well as code meant to run on more types of systems.

To train an LSTM language model, run train_{en,fr,de,ru,he}.sh in LM_syneval/example_scripts.

To obtain model perplexities on a test corpus, run the following (in LM_syneval/word-language-model):

python main.py --test --lm_data $corpus_dir/ --save models/$model_pt --save_lm_data models/$model_bin --testfname $test_file

There is a test script in LM_syneval/example_scripts.

Obtaining Word Scores on Syntactic Evaluation Sets

To obtain word-by-word model surprisals on the syntactic evaluation sets, run the following (in LM_syneval/word-language-model):

./test.sh $evalset_dir $model_dir $test_case

To evaluate on every test case in a directory of evaluation sets, pass all as the $test_case argument.

The above script outputs a series of files with the extension .$model_dir.wordscores in the same directory as the evaluation sets.

Then, to analyze these word-by-word scores and obtain scores per-case, run the following (in LM_syneval/word-language-model):

python analyze_results.py --score_dir $score_dir --case $case

where $score_dir is a directory containing .wordscores files, and case refers to the syntactic evaluation case (e.g., obj_rel_across_anim. By default, --case is all; this will give scores on every stimulus type in the specified directory.

By default, the above script compares the probability of entire grammatical and ungrammatical sentences when obtaining accuracies. To calculate accuracies based solely on the individual varied words, pass the --word_compare argument to analyze_results.py.

Modified BERT-Syntax Code for mBERT

We provide a very slightly modified version of Yoav Goldberg's BERT-Syntax code. Additionally, we provide scripts for pre-processing the syntactic evaluation sets generated by AVGs into the format required by BERT-Syntax.

The model loaded in eval_bert.py is now bert-base-multilingual-cased. Additionally, the script is now able to handle input other than cases from the English Marvin & Linzen set.

To pre-process an evaluation set for BERT or mBERT, copy the make_for_bert.py script to the folder containing the evaluation set and then run it from that directory. This will produce a forbert.tsv file which you can then pass as input to the eval_bert.py script.

Example usage:

python eval_bert.py marvin > results/$lang_results_multiling.txt

Licensing

CLAMS is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

clams's People

Contributors

aaronmueller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.