Giter Site home page Giter Site logo

adlucem / contextual-repr-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nelson-liu/contextual-repr-analysis

0.0 1.0 1.0 20.46 MB

A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and Transferability of Contextual Representations", to appear at NAACL 2019.

Home Page: http://nelsonliu.me/papers/liu+gardner+belinkov+peters+smith.naacl2019.pdf

Dockerfile 0.05% Shell 5.28% Python 75.59% Perl 18.03% C 1.05%

contextual-repr-analysis's Introduction

Build Status codecov

contextual-repr-analysis

A toolkit for evaluating the linguistic knowledge and transferability of contextual word representations. Code for Linguistic Knowledge and Transferability of Contextual Representations, to appear at NAACL 2019.

For a description of the included tasks, see TASKS.md.

Table of Contents

Installation

This project is being developed in Python 3.6, and CI runs the tests in Python 3.6 as well (via TravisCI).

Conda will set up a virtual environment with the exact version of Python used for development along with all the dependencies needed to run the code.

  1. Download and install Conda.

  2. Change your directory to your clone of this repo.

    cd contextual-repr-analysis
  3. Create a Conda environment with Python 3.6 .

    conda create -n contextual_repr_analysis python=3.6
  4. Now activate the Conda environment. You will need to activate the Conda environment in each terminal in which you want to run code from this repo.

    source activate contextual_repr_analysis
  5. Install the required dependencies.

    pip install -r requirements.txt

You should now be able to test your installation with py.test -v. Congratulations!

Getting Started: Evaluating Representations

This section walks through an example of evaluating ELMo on the English Web Treebank (EWT) English POS tagging task.

Step 1. Precomputing the Word Representations

The easiest way to get started with evaluating your representations is to precompute representations for each word in each sentence in the evaluation dataset. In each of the data/ directories, there exists a text file of sentences (newline delimited, and tokens are space-delimited). These are the sentences used during training and evaluation, so getting representations for these should be enough. If you write a new DatasetReader and want to generate these sentences, use the script at ./scripts/get_config_sentences.py (see python ./scripts/get_config_sentences.py -h for more information on usage).

The format of the HDF5 file should be as follows:

  1. The keys should be numbers (represented as strings), corresponding to line numbers.

  2. The value associated with each key is expected to a numpy array of word representations. Acceptable shapes are (sequence_length, representation_dim) or (num_layers, sequence_length, representation_dim).

  3. Another key, the string value "sentence_to_index", should store a string-serialized JSON dictionary mapping from sentences (the sentences that the representations in the values are calculated from) to string numbers (the other keys of the HDF5 file).

If you have a Dict[str, str] called sentence_to_index and a Dict[str, numpy.ndarray] named vectors containing a mapping from str numbers to vectors (a dictionary with consecutive numbers as keys and numpy arrays as values), you can pass the dictionaries into the following function to produce an HDF5 file with the proper format.

def make_hdf5_file(sentence_to_index, vectors):
    with h5py.File(output_file_path, 'w') as fout:
        for key, embeddings in vectors.items():
            fout.create_dataset(
                str(key),
                embeddings.shape, dtype='float32',
                data=embeddings)
        sentence_index_dataset = fout.create_dataset(
            "sentence_to_index",
            (1,),
            dtype=h5py.special_dtype(vlen=str))
        sentence_index_dataset[0] = json.dumps(sentence_to_index)

Step 2. Creating the experiment configuration

./experiment_configs contains all the experiment configurations used in this project. TODO (nfliu): Write more about writing your own experiment config.

Step 3. Training the probing model

To train a probing model on top of the precomputed word representations, we use the allennlp train command.

Given a single configuration file, we can train with:

allennlp train <config_path> -s <model_save_path> --include-package contexteval

For example, for training a contextualizer on the topmost ELMo layer (the default) for POS tagging:

allennlp train experiment_configs/elmo_original/ewt_pos_tagging.json \
    -s ewt_pos_tagging_topmost_layer \
    --include-package contexteval

Note that the precomputed contextualizers in the experiment config do not have a layer specified. This causes models to default to using the topmost layer. To train on, say, the first layer (index 0), you can run this command:

allennlp train experiment_configs/elmo_original/ewt_pos_tagging.json \
    -s models/elmo_original/ewt_pos_tagging_layer_0 --include-package contexteval \
    --overrides '{"dataset_reader": {"contextualizer": {"layer_num": 0}}, "validation_dataset_reader": {"contextualizer": {"layer_num": 0}}}'

To train on all layers, one-by-one, you can wrap the above in a bash for-loop.

for i in 0 1 2; do allennlp train experiment_configs/elmo_original/ewt_pos_tagging.json \
    -s models/elmo_original/ewt_pos_tagging_layer_${i} --include-package contexteval \
    --overrides '{"dataset_reader": {"contextualizer": {"layer_num": '${i}'}}, "validation_dataset_reader": {"contextualizer": {"layer_num": '${i}'}}}'; done

Step 4: Evaluating the probing model on test data

To evaluate a trained probing model on test data, use the allennlp evaluate command.

To evaluate the three models we trained above and log the output to a file, we can run:

for i in 0 1 2; do allennlp evaluate models/elmo_original/ewt_pos_tagging_layer_${i}/model.tar.gz \
    --evaluation-data-file ./data/pos/en_ewt-ud-test.conllu --cuda-device 0 \
    --include-package contexteval 2>&1 | tee models/elmo_original/ewt_pos_tagging_layer_${i}/evaluation.log; done

References

@InProceedings{liu-gardner-belinkov-peters-smith:2019:NAACL,
  author    = {Liu, Nelson F.  and  Gardner, Matt  and  Belinkov, Yonatan  and  Peters, Matthew E.  and  Smith, Noah A.},
  title     = {Linguistic Knowledge and Transferability of Contextual Representations},
  booktitle = {Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  year      = {2019}
}

contextual-repr-analysis's People

Contributors

nelson-liu avatar

Watchers

 avatar

Forkers

ujwal-narayan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.