Giter Site home page Giter Site logo

kb-plugin's Introduction

KB-Plugin: A Plug-and-play Framework for LLMs to Induce Programs over Low-resourced KBs

Contributions Welcome language-python3 paper

Program induction (PI) has become a promising paradigm for using knowledge bases (KBs) to help large language models (LLMs) answer complex knowledge-intensive questions. Nonetheless, PI typically relies on a large number of parallel question-program pairs to make the LLM aware of the schema of the given KB, and is thus challenging for many low-resourced KBs that lack annotated data. To this end, we propose KB-Plugin, a plug-and-play framework that enables LLMs to induce programs over any low-resourced KB. Firstly, KB-Plugin adopts self-supervised learning to encode the detailed schema information of a given KB into a pluggable module, namely schema plugin. Secondly, KB-Plugin utilizes abundant annotated data from a rich-resourced KB to train another pluggable module, namely PI plugin, which can help the LLM extract question-relevant schema information from the schema plugin of any KB and utilize this information to induce programs over this KB. Experiments on five heterogeneous KBQA datasets show that KB-Plugin achieves better or comparable performance with 25x smaller backbone LLM compared to SoTA PI methods for low-resourced KBs, and even approaches the performance of supervised methods.

This is the accompanying code & data for the paper "KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases".

Plugin Learning and Transfer Framework

File Structure

ProbTree/
├─  data/: put downloaded datasets here
├─  checkpoints/: put trained or downloaded plugin checkpoints here
├─  models_hf/: put your Llama2-7B model here
├─  source_kqapro/: code for training source domain schema plugins and PI plugin using KQA Pro
├─  transfer_freebase/: code for experiments on GrailQA, GraphQ, and WebQSP
└──  transfer_domain_specific/: code for experiments on MetaQA and SoAyBench

Data & Model Preparation

To run experiments, please first download Llama2-7B model and put it under ./models_hf.

Our data and plugin checkpoints can be download from here.

To conduct evaluation on GrailQA, GraphQ, and WebQSP, please follow Freebase Setup to set up a Virtuoso triplestore service. After starting your virtuoso service, replace the url in ./transfer_freebase/utils/sparql_executer.py with your own.

Training on KQA Pro

KB Generation and Data Augmentation

cd ./source_kqapro/sft_llama

Use KQA Pro to generate 16 source KBs with different schemas and corresponding augmented data.

python get_diff_kb.py

Source KB Schema Plugin Learning

Firstly, sample factual triples from each source KBs to generate training data.

python get_el_data.py --metaqa 0 # set 1 to remove relations in metaqa
python process_el_data.py --metaqa 0 

Secondly, train an individual schema plugin for each source KB.

python get_el_data.py --metaqa 0 
python process_el_data.py --metaqa 0 
bash train_el_lora_0-7.sh
bash train_el_lora_8-15.sh

PI Plugin Learning

python mix_pi_data.py --metaqa 0
python process_pi_data.py --metaqa 0
bash train_pi_plugin.sh

Transfer the PI Plugin to Other KBs

Target KB Schema Plugin Learning

Train the schema plugin for each target KB.

cd ./transfer_freebase/sft_llama
python process_el_data.py --dataset grailqa # graphq, webq
bash train_el_lora.sh 

cd ./transfer_domain_specific/sft_llama
python process_el_data.py --dataset metaqa # soay
bash train_el_lora.sh 

Evaluation

cd ./transfer_freebase
python eval_kb_plugin.py --dataset grailqa # graphq, webq

cd ./transfer_domain_specific
python eval_kb_plugin.py --dataset metaqa # soaybench

Citation

If you find our work useful, please consider citing KB-Plugin:

@article{zhang2024kbplugin,
      title={KB-Plugin: A Plug-and-play Framework for Large Language Models to Induce Programs over Low-resourced Knowledge Bases}, 
      author={Jiajie Zhang and Shulin Cao and Linmei Hu and Ling Feng and Lei Hou and Juanzi Li},
      journal={arXiv preprint arXiv:2402.01619},
      year={2024}
}

kb-plugin's People

Contributors

neo-zhangjiajie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.