Giter Site home page Giter Site logo

thu-keg / copen Goto Github PK

View Code? Open in Web Editor NEW
19.0 6.0 1.0 8.67 MB

The official code and dataset for EMNLP 2022 paper "COPEN: Probing Conceptual Knowledge in Pre-trained Language Models".

License: MIT License

Python 98.65% Shell 0.28% Makefile 0.01% Dockerfile 0.03% Jsonnet 0.01% Jupyter Notebook 0.79% C 0.02% C++ 0.03% Cuda 0.20%
knowledge-graph language-model nlp probing model-analysis

copen's Introduction

COPEN

Dataset and code for EMNLP 2022 paper ''COPEN: Probing Conceptual Knowledge in Pre-trained Language Models''. COPEN is a COnceptual knowledge Porobing bENchmark which aims to analyze the conceptual understanding capabilities of Pre-trained Language Models (PLMs). Specifically, COPEN consists of three tasks:

  • Conceptual Similarity Judgment (CSJ). Given a query entity and several candidate entities, the CSJ task requires to select the most conceptually similar candidate entity to the query entity.
  • Conceptual Property Judgment (CPJ). Given a statement describing a property of a concept, PLMs need to judge whether the statement is true.
  • Conceptualization in Contexts (CiC). Given a sentence, an entity mentioned in the sentence, and several concept chains of the entity, PLMs need to select the most appropriate concept according to the context for the entity.

Examples

Extensive experiments on different sizes and types of PLMs show that existing PLMs systematically lack conceptual knowledge and suffer from various spurious correlations. We believe this is a critical bottleneck for realizing human-like cognition in PLMs. More concept-aware objectives or architectures are needed to develop conceptual knowledgeable PLMs.

Codalab

To get the test results, you need to submit your results to codalab.

1. Quick Start

The code repository is based on Pytorch and Transformers. Please use the following command to install all the necessary dependcies. pip install -r requirements.txt

2. Download Datasets

The COPEN benchmark is placed on Tsinghua Cloud, please use the following command to download the datasets and place them in the propor path.

cd data/
wget --content-disposition https://cloud.tsinghua.edu.cn/f/f0b33fb429fa4575aa7f/?dl=1
unzip copen_data.zip
mkdir task1/data
mkdir task2/data
mkdir task3/data
mv copen_data/task1/* task1/data
mv copen_data/task2/* task2/data
mv copen_data/task3/* task3/data 

3. Pre-processing Datasets

Probing

cd task1
python probing_data_processor.py
cd ../
cd task2
python probing_data_processor.py
cd ../
cd task3
python probing_data_processor.py
cd ../

Fine-tuning

python processor_utils.py task1 mc 
python processor_utils.py task2 sc
python processor_utils.py task3 mc 

4. Run

Probing

cd code/probing
bash task1/run.sh 0 bert bert-base-uncased
bash task2/run.sh 0 bert bert-base-uncased
bash task3/run.sh 0 bert bert-base-uncased

Fine-Tuning

cd code/finetuning
cd task1/ 
bash ../run.sh 0 bert bert-base-uncased task1 mc 42
cd task2/ 
bash ../run.sh 0 bert bert-base-uncased task2 sc 42
cd task3/ 
bash ../run.sh 0 bert bert-base-uncased task3 mc 42

5. Cite

If our codes or benchmark help you, please cite us:

@inproceedings{peng2022copen,
  title={COPEN: Probing Conceptual Knowledge in Pre-trained Language Models},
  author={Peng, Hao and Wang, Xiaozhi and Hu, Shengding and Jin, Hailong and Hou, Lei and Li, Juanzi and Liu, Zhiyuan and Liu, Qun},
  booktitle={Proceedings of EMNLP},
  year={2022}
}

copen's People

Contributors

bakser avatar h-peng17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

techthiyanes

copen's Issues

Minor problems expected to be fixed

Hi, I'm quite interested in your work and would like to try your code.
Your code is excellent. However, there is several problems that I have to fix before successfully running it.

  1. Suggest adding mkdir task[1-3]/data in preprocessing
  2. According to README, the data is put in task[1-3]/data. However, in most of the py files for preprocessing and running, the path is task[1-3]/data/ood/
  3. For data preprocessing of task2 probing, there is no such file given named data_processor_for_ppl.py. I tried using probing_data_processor.py instead.

What files should be submitted to codalab and What does "mask_position" in probing means?

Hello, I would like to try my models in the probing settings.
However, it seems that the test sets are not labelled, so we can only submit to codalab.
I spent several hours studying how to generate a file to be submitted to codalab, and now I suppose files like answer-bert-base-uncased-probs.json are what we should submit. Is it right?
Also, I found that there are an argument named "mask_position", which is a value from [all, answer, concept], and these values yield corresponding files. I wonder what's the difference among these values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.