Giter Site home page Giter Site logo

chiennv2000 / dhgnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nutcrtnk/dhgnet

0.0 0.0 0.0 38.29 MB

Code for paper "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph", EMNLP 2021 - findings.

License: MIT License

Python 100.00%

dhgnet's Introduction

DHGNet

Code repository for findings of EMNLP 2021 paper "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph." [ACL] [arxiv]

Requirements

We tested the code on:

other requirements:

  • numpy
  • pandas
  • scikit-learn
  • gensim
  • tqdm
  • nltk
  • pythainlp 2.3.1 (for Thai language tokenizer)

Usage

  1. Extract data/text_cls.zip file for datasets.

  2. Run the code in src folder using the command for training and evaluating DHGNeten.

    • For Bosnian setting:
      python main.py bosnian --rnn_layers 1 --directed 0 --add_from_dict 30000 --name [output_model_name] .

    • For other settings (bengali,malayalam,tamil,thai_t,thai_w):
      python main.py [setting_name] --name [output_model_name] .

    • For DHGNetmulti, add a command option --langs ar,en,es,fa,fr,zh .

Note that the code will automatically download source word-embeddings (default fasttext) which may take time and disk space.
Optionally, you can download dump files that contain all related source word-embeddings for the aforementioned settings in https://1drv.ms/u/s!AkynV6rCKmmXkNBYwRchAWfurRkBrQ?e=NnOszA and put the files in folder data/word_emb/fasttext_wiki.
Then run the code with an additional command option --use_temp_only 1
** To run using only English as source, you can download only en.db_temp.pkl.

Reference

If you find the code helpful, please cite our work:

@inproceedings{chairatanakul-etal-2021-cross-lingual,
    title = "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph",
    author = "Chairatanakul, Nuttapong  and
      Sriwatanasakdi, Noppayut  and
      Charoenphakdee, Nontawat  and
      Liu, Xin  and
      Murata, Tsuyoshi",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.130",
    pages = "1504--1517",
}

dhgnet's People

Contributors

nutcrtnk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.