Giter Site home page Giter Site logo

karthikncode / deeprl-informationextraction Goto Github PK

View Code? Open in Web Editor NEW
231.0 22.0 74.0 270.08 MB

Code for the paper "Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning" http://arxiv.org/abs/1603.07954

License: MIT License

Python 0.02% Shell 0.01% Lua 0.01% Jupyter Notebook 0.01% OpenEdge ABL 2.43% Roff 97.54%

deeprl-informationextraction's Introduction

Information Extraction with Reinforcement Learning

Installation

You will need to install Torch and the python packages in requirements.txt.

You will also need to install the Lua dev library liblua (sudo apt-get install liblua5.2) and the signal package for Torch to deal with SIGPIPE issues in Linux. (You may need to uninstall the signal-fft package or rename it to avoid conflicts.)

Data Preparation

Create the vectorizers (using a pre-trained model), for example:
python vec_consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/vec_train.5.p
python vec_consolidate.py dloads/Shooter/dev.extra 5 trained_model2.p consolidated/vec_dev.5.p

Consolidate the articles, for example:
python consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/train+context.5.p consolidated/vec_train.5.p
python consolidate.py dloads/Shooter/dev.extra 5 trained_model2.p consolidated/dev+context.5.p consolidated/vec_dev.5.p

Make sure you use the correct vec_xyz.p file as input to the consolidate.py script. You can also find pre-consolidated files in pickle format here: link

Running the code

  • Change to the code directory: cd code/

  • First run the server, for example:
    python server.py --port 7000 --trainEntities consolidated/train+context.5.p --testEntities consolidated/dev+context.5.p --outFile outputs/run.out --modelFile trained_model2.p --entity 4 --aggregate always --shooterLenientEval True --delayedReward False --contextType 2

  • In a separate terminal/tab, change to the agent code directory: cd code/dqn/

  • Then run the agent:
    ./run_cpu 7000 logs/tmp/
    Make sure the port numbers for the server and agent match up.

Acknowledgements

deeprl-informationextraction's People

Contributors

adi-sharma avatar karthikncode avatar yala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeprl-informationextraction's Issues

Error when running consolidate.py

Hi @karthikncode,

I did run the entire pipeline a couple of months back, without any issues.

But, when I run it now, I get the following error while preparing the data:

When I run - "python consolidate.py dloads/Shooter/train.extra 5 trained_model2.p consolidated/train+context.5.p consolidated/vec_train.5.p"

I get this -

screen shot 2017-02-10 at 11 19 59 am

I seems to be some kind of index error in 'predict.py'
I tried it on different machines to no avail. Any ideas on how this could be fixed?

Right click crash

SpongeForge version: spongeforge-1.12.2-2611-7.1.0-BETA-2996

Forge version: forge-1.12.2-14.23.2.2611-universal

Java version: openjdk version "1.8.0_161"

Operating System: centos 7

Plugins/Mods:
SpongeAPI, SpongeForge, Actually Additions, Advanced Generators, BD Lib, Baubles, Bed Patch, BedFix, Binnie Core, Binnie's Botany, Binnie's Design, Binnie's Extra Bees, Binnie's Extra Trees, Binnie's Genetics, Blood Magic: Alchemical Wizardry, Botania, Brandon's Core, CoFH Core, CoFH World, CodeChicken Lib, Cyclops Core, Draconic Evolution, Epic Siege Mod, Erebus, EvilCraft, EvilCraft-Compat, FTB Utilities, FTBLib, Forestry, Galacticraft Core, Galacticraft Planets, Guide-API, Immersive Engineering, Improved Extraction, Industrial Foregoing, IndustrialCraft 2, Iron Chest, Just Enough Items, MJRLegendsLib, Mantle, Mekanism, MekanismGenerators, MekanismTools, Micdoodle8 Core, More Planets, NetherPortalFix, NuclearCraft, OMLib, Open Modular Passive Defense, Open Modular Turrets, Planet Progression, PlusTiC, Reborn Core, Redstone Flux, Reforged, Shadowfacts' Forgelin, TESLA, Tech Reborn, Tesla Core Lib, Tesla Core Lib Registries, The Betweenlands, Thermal Cultivation, Thermal Dynamics, Thermal Expansion, Thermal Foundation, Tinker I/O, Tinkers' Construct, UniDict, WanionLib

Issue Description
Right click item: improvedextraction:auto_cutting_table crash

Error log:
https://gist.github.com/xxlio109/095e70a9bbe8142d450ec2bbcdcd25ee

sponge git:
SpongePowered/SpongeForge#2073

Questions about the query templates

@karthikncode
In the paper it was mentioned: The queries are based on automatically generated templates, created using the title of an article along with words most likely to co-occur with each entity type in the training data. I have a couple of questions regarding these templates:

  1. Where in the code or data files that such templates are generated or located? I have scanned through the files and was unable to find them.
  2. Wonder what's your opinion about using Word2Vec to help with creating such templates? My thinking here is that since Word2Vec has the ability to discover structured relationship in the training text corpus, so perhaps the query templates created with it might be more relevant and effective for querying.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.