Giter Site home page Giter Site logo

lashoun / deepspeare Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jhlau/deepspeare

0.0 1.0 0.0 213.83 MB

Code for Deep-speare: a joint neural model of poetic language, meter and rhyme

License: Apache License 2.0

Python 12.94% HTML 87.06%

deepspeare's Introduction

Requirements

  • python2.7
  • tensorflow 0.12
    • CPU: pip install tensorflow==0.12.0
    • GPU: pip install tensorflow-gpu==0.12.0
  • nltk, cmudict and stopwords
    • import nltk; nltk.download("cmudict"); nltk.download("stopwords")
  • gensim
    • pip install gensim
  • sklearn
    • pip install sklearn
  • numpy
    • pip install numpy

Data / Models

  • datasets/gutenberg/data.tgz: sonnet data, with train/valid/test splits
  • pretrain_word2vec/dim100/*: pre-trained word2vec model
  • trained_model/model.tgz: trained sonnet model

Pre-training Word Embeddings

  • The pre-trained word2vec model has already been supplied: pretrain_word2vec/dim100/*
  • It was trained on 34M Gutenberg poetry data: download link
  • If you want to train your own word embeddings, you can use the python script (uses gensim's word2vec)
    • python pretrain_word2vec.py

Training the Sonnet Model

  1. Extract the data; it should produce the train/valid/test splits
    • cd datasets/gutenberg; tar -xvzf data.tgz
  2. Unzip the pre-trained word2vec model
    • gunzip pretrain_word2vec/dim100/*
  3. Set up model hyper-parameters and other settings, which are all defined in config.py
    • the default configuration is the optimal configuration used in the paper (documented here)
  4. Run python sonnet_train.py
    • takes about 2-3 hours on a single K80 GPU to train 30 epochs

Generating Sonnet Quatrain

  1. Extract the trained model
    • cd trained_model; tar -xvzf model.tgz
  2. Run python sonnet_gen.py -m trained_model
    • the default configuration is the generation configuration used in the paper
    • takes about a minute to generate one quatrain on CPU (GPU not necessary)
usage: sonnet_gen.py [-h] -m MODEL_DIR [-n NUM_SAMPLES] [-r RM_THRESHOLD]
                     [-s SENT_SAMPLE] [-a TEMP_MIN] [-b TEMP_MAX] [-d SEED]
                     [-v] [-p SAVE_PICKLE]

Loads a trained model to do generation

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_DIR, --model-dir MODEL_DIR
                        directory of the saved model
  -n NUM_SAMPLES, --num-samples NUM_SAMPLES
                        number of quatrains to generate (default=1)
  -r RM_THRESHOLD, --rm-threshold RM_THRESHOLD
                        rhyme cosine similarity threshold (0=off; default=0.9)
  -s SENT_SAMPLE, --sent-sample SENT_SAMPLE
                        number of sentences to sample from using pentameter
                        loss as sample probability (1=turn off sampling;
                        default=10)
  -a TEMP_MIN, --temp-min TEMP_MIN
                        minimum temperature for word sampling (default=0.6)
  -b TEMP_MAX, --temp-max TEMP_MAX
                        maximum temperature for word sampling (default=0.8)
  -d SEED, --seed SEED  seed for generation (default=1)
  -v, --verbose         increase output verbosity
  -p SAVE_PICKLE, --save-pickle SAVE_PICKLE
                        save samples in a pickle (list of quatrains)

Generated Quatrains:

python sonnet_gen.py -m trained_model/ -d 1

Temperature = 0.6 - 0.8
  01  [0.43]  with joyous gambols gay and still array
  02  [0.44]  no longer when he twas, while in his day
  03  [0.00]  at first to pass in all delightful ways
  04  [0.40]  around him, charming and of all his days
  
  
python sonnet_gen.py -m trained_model/ -d 2
  
Temperature = 0.6 - 0.8
  01  [0.44]  shall i behold him in his cloudy state
  02  [0.00]  for just but tempteth me to stop and pray
  03  [0.00]  a cry: if it will drag me, find no way
  04  [0.40]  from pardon to him, who will stand and wait
  
  

Crowdflower and Expert Evaluation

  • Annotations can be found in the folder: evaluation_annotation/

Media Coverage

Publication

Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke and Adam Hammond (2018). Deep-speare: A joint neural model of poetic language, meter and rhyme (Supplementary Material). In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, pp. 1948--1958.

Talk

Creativity, Machine and Poetry for a public forum on language [video]

deepspeare's People

Contributors

jhlau avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.