karthikncode / deeprl-informationextraction Goto Github PK

Code for the paper "Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning" http://arxiv.org/abs/1603.07954

License: MIT License

Python 0.02% Shell 0.01% Lua 0.01% Jupyter Notebook 0.01% OpenEdge ABL 2.43% Roff 97.54%

deeprl-informationextraction's Introduction

Information Extraction with Reinforcement Learning

Installation

You will need to install Torch and the python packages in requirements.txt.

You will also need to install the Lua dev library liblua (sudo apt-get install liblua5.2) and the signal package for Torch to deal with SIGPIPE issues in Linux. (You may need to uninstall the signal-fft package or rename it to avoid conflicts.)

Data Preparation

Create the vectorizers (using a pre-trained model), for example:
python vec_consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/vec_train.5.p
python vec_consolidate.py dloads/Shooter/dev.extra 5 trained_model2.p consolidated/vec_dev.5.p

Consolidate the articles, for example:
python consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/train+context.5.p consolidated/vec_train.5.p
python consolidate.py dloads/Shooter/dev.extra 5 trained_model2.p consolidated/dev+context.5.p consolidated/vec_dev.5.p

Make sure you use the correct vec_xyz.p file as input to the consolidate.py script. You can also find pre-consolidated files in pickle format here: link

Running the code

Change to the code directory: cd code/
First run the server, for example:
python server.py --port 7000 --trainEntities consolidated/train+context.5.p --testEntities consolidated/dev+context.5.p --outFile outputs/run.out --modelFile trained_model2.p --entity 4 --aggregate always --shooterLenientEval True --delayedReward False --contextType 2
In a separate terminal/tab, change to the agent code directory: cd code/dqn/
Then run the agent:
./run_cpu 7000 logs/tmp/
Make sure the port numbers for the server and agent match up.

Acknowledgements

Deepmind's DQN codebase

deeprl-informationextraction's People

Contributors

Stargazers

Watchers

Forkers

chagge clear-datacenter yuxiaoqing22 fence bobz653 fangzheng354 g-wang watermars oneproton sperfu louiekang benjamesbabala stevenlol collawolley ml-lab fundou luzc08 zencoding crazylyf ericxsun jolinxql shyamalschandra xiliangsong adi-sharma vyraun sungjinlees nextinnovationucas 52nlp oppa3109 zarana-parekh harendranathvegi9 peterzhb dapeng2018 adazhou liushifeng lebronlambert codeaudit cosecant-csc wangyan9411 vikingmew jkuruzovich li10141110 edihb jane8816 alexanderjlong lumiqai manojbalaji1 planetceres yucoian shubhampachori12110095 michael-wzhu getbioinfo afcarl githubforandy guiyaocheng kaqikaqi shimafoolad ealee d3r3kx14o we1l1n wukuan405 tonny2v helen0804 njirene pengyuange kunlun-zhu wangphoebe deepakthandra gztangde avinashronanki ptrcklv prayagrajsingh zhangbeibei1991 kingsjr26

deeprl-informationextraction's Issues

Error when running consolidate.py

Hi @karthikncode,

I did run the entire pipeline a couple of months back, without any issues.

But, when I run it now, I get the following error while preparing the data:

When I run - "python consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/train+context.5.p consolidated/vec_train.5.p"

I get this -

I seems to be some kind of index error in 'predict.py'
I tried it on different machines to no avail. Any ideas on how this could be fixed?

Right click crash

SpongeForge version: spongeforge-1.12.2-2611-7.1.0-BETA-2996

Forge version: forge-1.12.2-14.23.2.2611-universal

Java version: openjdk version "1.8.0_161"

Operating System: centos 7

Plugins/Mods:
SpongeAPI, SpongeForge, Actually Additions, Advanced Generators, BD Lib, Baubles, Bed Patch, BedFix, Binnie Core, Binnie's Botany, Binnie's Design, Binnie's Extra Bees, Binnie's Extra Trees, Binnie's Genetics, Blood Magic: Alchemical Wizardry, Botania, Brandon's Core, CoFH Core, CoFH World, CodeChicken Lib, Cyclops Core, Draconic Evolution, Epic Siege Mod, Erebus, EvilCraft, EvilCraft-Compat, FTB Utilities, FTBLib, Forestry, Galacticraft Core, Galacticraft Planets, Guide-API, Immersive Engineering, Improved Extraction, Industrial Foregoing, IndustrialCraft 2, Iron Chest, Just Enough Items, MJRLegendsLib, Mantle, Mekanism, MekanismGenerators, MekanismTools, Micdoodle8 Core, More Planets, NetherPortalFix, NuclearCraft, OMLib, Open Modular Passive Defense, Open Modular Turrets, Planet Progression, PlusTiC, Reborn Core, Redstone Flux, Reforged, Shadowfacts' Forgelin, TESLA, Tech Reborn, Tesla Core Lib, Tesla Core Lib Registries, The Betweenlands, Thermal Cultivation, Thermal Dynamics, Thermal Expansion, Thermal Foundation, Tinker I/O, Tinkers' Construct, UniDict, WanionLib

Issue Description
Right click item: improvedextraction:auto_cutting_table crash

Error log:
https://gist.github.com/xxlio109/095e70a9bbe8142d450ec2bbcdcd25ee

sponge git:
SpongePowered/SpongeForge#2073

Questions about the query templates

@karthikncode
In the paper it was mentioned: The queries are based on automatically generated templates, created using the title of an article along with words most likely to co-occur with each entity type in the training data. I have a couple of questions regarding these templates:

Where in the code or data files that such templates are generated or located? I have scanned through the files and was unable to find them.
Wonder what's your opinion about using Word2Vec to help with creating such templates? My thinking here is that since Word2Vec has the ability to discover structured relationship in the training text corpus, so perhaps the query templates created with it might be more relevant and effective for querying.

Thanks!

pycrfsuite required?

Seems that the requirements is missing the pycrfsuite?

karthikncode / deeprl-informationextraction Goto Github PK

deeprl-informationextraction's Introduction

Information Extraction with Reinforcement Learning

Installation

Data Preparation

Running the code

Acknowledgements

deeprl-informationextraction's People

Contributors

Stargazers

Watchers

Forkers

deeprl-informationextraction's Issues

Error when running consolidate.py

Right click crash

Questions about the query templates

pycrfsuite required?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent