Giter Site home page Giter Site logo

shorties's Introduction

The Algorithmia Shorties Contest

Generating short story fiction with algorithms

The Algorithmia Shorties contest is designed to help programmers of all skill levels get started with Natural Language Processing tools and concepts. We are big fans of NaNoGenMo here at Algorithmia, even producing our own NaNoGenMo entry this year, so we thought we'd replicate the fun by creating this generative short story competition!

The Prizes

We'll be giving away $300 USD for the top generative short story entry!

Additionally there will be two $100 Honorable Mention prizes for outstanding entries. We'll also highlight the winners and some of our favorite entries on the Algorithmia blog.

The Rules

We're pretty fast and loose with what constitutes a short story. You can define what the "story" part of your project is, whether that means your story is completely original, a modified copy of another book, a collection of tweets woven into a story, or just a non-nonsensical collection of words! The minimum requirements are that your story is primarily in English and no more than 7,500 words.

Each story will be evaluated with the following rubric:

  • Originality
  • Readability
  • Creative use of the Algorithmia API

We'll read though all the entries and grab the top 20. The top 20 stories will be sent to two Seattle school teachers for some old-school red ink grading before the final winner selection.

The contest runs from December 9th to January 9th. Your submission must entered before midnight PST on January 9th. Winners will be announced on January 13th.

How to generate a short story

Step One: Find a Corpus

The corpus is the set of texts that you will be using to base your short story on. For this project, you can base your short story on just one book or you can grab a whole selection of texts from various sources to be your corpus.

If you want to base your short story on another book, try the public domain section of Feedbooks. Another good site to check out is Project Gutenberg, which is home to over 50,000 ebooks. All of the public domain section on Feedbooks as well as most of the content on Project Gutenberg is available for you to use without infringing on copyright law. Other interesting corpora that folks have used as a base for their generative fiction include software license agreements, personal journals, public speeches, or Wikipedia articles. It's really up to you to choose what you find most interesting!

If you use just one book or text source, your resulting generated short story will be much more similar to the original work than if you combine multiple text sources. Same goes for the corpus length. Since we are doing short stories, using a smaller corpus is just fine.

Step Two: Generate Trigrams

Now let's get to work. The first thing you want to do with the text you have chosen is to make sure that it's in a pretty clean state. If the book or other text you've selected has copyright notices, table of contents, or other text that won't be needed in your short story, go ahead and remove that so you end up with a .txt file containing only the text you want to base your story on.

I decided to make a little short story based on Right Ho, Jeeves by P.G. Wodehouse. I grabbed the book from Project Gutenberg, so there was a little bit of cleanup to do. I used Guten-gutter, a python tool for cleaning up Project Gutenberg texts.

We want to take our text and create a trigram file that we will use in step three to generate new text. The trigram model that we are building is essentially a list of word sequences, in our case sequences of 3 words, that appear in the text. Read more about n-grams to get a deeper understanding of what the algorithm is constructing.

Let's walk through a short python script based on the one that I used to generate a novel for NaNoGenMo last month. You can find the full script here, which you with run with python generate-trigrams.py. While I chose to do this in python, feel free to use the language you feel the most comfortable working in!

First things first, we need to import the Algorithmia client. If you haven't used Algorithmia before, give the python docs a quick glance. Short version: install the client with pip install algorithmia.

import Algorithmia
import os

client              = Algorithmia.client('YOUR_API_KEY_HERE')
trigram_file_name   = "right-ho-trigrams.txt"

As you can see, we've also set up a few variables at the start of our script to help keep things neat. Be sure to replace YOUR_API_KEY_HERE with the API key from your account. Find this key in your Algorithmia profile by clicking on your user name in the top right-hand side of the Algorithmia site.

To create our short story, we're going to use the algorithms Generate Trigram Frequencies and Sentence Split. Because Generate Trigram Frequencies takes an array of strings, we'll run our entire text file through Sentence Split which conveniently take a block of text and returns the sentences as an array of strings.

We first open the file and set the content as our input variable. On the next line, we send that input to Algorithmia by piping the input to the algorithm with client.algo('StanfordNLP/SentenceSplit/0.1.0').pipe(input). This will return the array of sentences we need to pass into the Generate Trigram algorithm.

# generate array of sentences
with open('rightho.txt', 'r') as content_file:
    input   = content_file.read()
    corpus  = client.algo('StanfordNLP/SentenceSplit/0.1.0').pipe(input)

Now that we have the sentences, we'll pass those into the Generate Trigram Frequencies algorithm along with two tags that mark the beginning and ends of the data. The final parameter is the address of the output file in your Data section on Algorithmia (no need to modify the last three parameters, the tags can be copied and the data URL will automatically put the file into your Data!).

#  generate trigrams
input = [corpus,
        "xxBeGiN142xx",
        "xxEnD142xx",
        "data://.algo/temp/" + trigram_file_name]

trigrams_file = client.algo('ngram/GenerateTrigramFrequencies/0.1.1').pipe(input)

print "Done! Your trigrams file is now available on Algorithmia."
print trigrams_file

Ok, cool! Now we've got a trigram model that we can use to generate our short story.

Step Three: Generate Paragraphs

While you can download the trigram file if you want, the Data API makes it easy to use it directly as an argument to the algorithm that we'll use to generate text. The algorithm to generate the trigrams returns the address of the trigram file in your Data collection. Navigate to the "My Data" section on Algorithmia where you'll find the a section for the algorithm on the left hand side.

my data screenshot

You'll see the newly created trigram file already there for you to use! Copy the full address of the file listed right below the filename. We'll pass this file location to the algorithms we use next.

First, let's make sure that our trigram model will generate some text for us. I like to do a quick sanity check by going to the algorithm Random Text From Trigram and inserting the Data address of my trigram model right in the in-browser sample code runner. When I stuck in my trigram model, Random Text From Trigram returned "It will be killing two birds with one stone, sir.". Looks good!

Now let's set up a script to generate a whole short story:

import Algorithmia
import os
import re
from random import randint

client            = Algorithmia.client('YOUR_API_KEY_HERE')
trigrams_file     = 'data://.algo/ngram/GenerateTrigramFrequencies/temp/right-ho-trigrams.txt'
book_title        = 'full_book.txt'
book              = ''
book_word_length  = 7500

while len(re.findall(r'\w+', book)) < book_word_length:
  print "Generating new paragraph..."
  input = [trigrams_file, "xxBeGiN142xx", "xxEnD142xx", (randint(1,9))]
  new_paragraph = client.algo('/lizmrush/GenerateParagraphFromTrigram').pipe(input)
  book += new_paragraph
  book += '\n\n'
  print "Updated word count:"
  print len(re.findall(r'\w+', book))

with open(book_title, 'w') as f:
    f.write(book.encode('utf8'))

f.close()

print "Done!"
print "You book is now complete. Give " + book_title + " a read now!"

Be sure to update the trigrams_file variable with the address of your trigram file. It will look very similar, with the exception of what your named it!

Following this script, you can see that we have constructed a simple loop that checks the book length and if it is less than 7,500 words, will make a call to the algorithm Generate Paragraph From Trigram. Again we pass in the beginning and ending tags from our trigram model, and this time we specify a paragraph length in sentences as the final parameter to our call. I let my program pick a random number between 1 and 9 with (randint(1,9)).

Finally the script will write the entire book to a .txt file in the same directory and you're ready to start reading!

Moving Beyond the Trigram

Now that you've got the basics down, feel free to make changes to the scripts or come up with something totally different. If you need inspiration on what kinds of books you can generate, be sure to check out the NaNoGenMo 2015 repo to see what other fiction generating programmers have come up with!

Where can you take this next? Here are some other algorithms available in the Marketplace that you might consider trying out to add some extra spice to your short story:

Spend a few minutes browsing the marketplace for other text and language related algorithms. You might find an unexpected algorithm that inspires you to try something new!

How to submit

Ready to share? We've set up this repo so we can read one another's stories! All you need to do is open an issue with a link to your code repository and book if you choose to host it somewhere else. You can also use the issues as a means to get help with your short story. Just comment on your issue if you are stuck or have any questions and we'll help out!

shorties's People

Stargazers

Tuhin Subhra Patra avatar Stanley Zheng avatar Guillaume FORTAINE avatar Antonio Forgione avatar Adriano Gil avatar Hüseyin Mert avatar Emme Ang avatar Stefan Bohacek avatar iac avatar Patrick avatar Jonathan Word avatar Zeynep Su Karasozen avatar Besir Kurtulmus avatar Aaron Mayzes avatar Amitkumar Karnik avatar Berkhan Eminsoy avatar Logan McDonald avatar

Watchers

Neil Fairweather avatar James Cloos avatar Aaron Mayzes avatar David Whitney avatar  avatar

shorties's Issues

n+7

Many years ago (I think it was in 2001) I participated in a class about Mathematics & Literature. Beside others the "n+7 method" was presented: You replace in a text each noun with the noun in a dictionary which is 7 places after the original noun, e.g., "a man meets a woman" is transformed to "a mandrake meets a wonder" using the Langenscheidt dictionary.

OK, here my submission:

Note: This is my first algorithm on Algorithmia and my first python script - bear with me! :-)
Thanks for the contest, it was a nice exercise.

Wrong Ho Jeeves [SAMPLE SUBMISSION]

This is the short story I generated using the sample python code and using the book Right Ho, Jeeves by P.G. Wodehouse as the basis for training my text generator.

I've uploaded the code I used to this gist, along with the full text of my short story.

Here's a preview of some of my favorite parts of the generated story so you can get a taste:

He quivered like a practical working proposition. I could not have embraced it eagerly but that the boy Glossop is the same dish day after her, wishing that some hideous disaster would strike this house like a full aunt. It would appear to square with your statement. Can't you see we shall, by any chance? Give me that there was no longer, sir. Now here's something else: You mean--for love? We talked for a pretty affair? And then? It will make an effort.

We're out here, laughing heartily. You would be over.

I especially love this part! It's almost self-aware.

I beg your pardon, sir, surely you can readily appreciate the point: What's all this mean?

No doubt. Right-ho. About nothing. I am like a little story.

[Note: This entry generated with the sample code is just intended to give you an idea of what you can submit as your entry in this issues area. Feel free to get creative with it & don't forget that you can open an issue before you are done to get feedback or help. Just remember to link us to your code and to the final text of your short story when you are done!]

Theology Swap

I just graduated from a coding bootcamp in Seattle, but before I lived here I was a theology grad student in New Jersey, and also worked with a few churches there. I've never used nlp algorithims before, and since some forms of theology can become so intricate and complex, I thought a fun use for this contest might be to write scripts that 'de-systematize' a systematic theologian. So, I took the first volume of Thomas Aquinas' Summa Theologica (which is ridiculously abstract and systematic), and wrote scripts to pull out just the questions and answers in order to generate trigrams and paragraphs from those.

Here is my code, along with the files where I dumped the questions and answers. After getting my questions/answers to split off correctly, I used Sentence Split, Generate Trigram Frequencies, and Generate Paragraphs from Trigrams on Algorithmia's site to generate a question from the question array, then an answer from the answer array. Then I added some scripts to change 'man/men' (as a 'universal') to 'humanity/people', and shifted many of the pronouns referring to God from 'He' to 'She', because it's a good thing to do, and Aquinas couldn't stop me : ) In this process it also seems that I made up two words 'humanifestation' and 'humanner', so, that's pretty fun...

This was my first real use of Python, and I had so much fun with it. I'm also pretty satisfied with the result. For example, looks like the answer to the third question needed to be preceded with a bite to eat:

Whether God Is Infinite?

I answer that, Nom.

It seems that the swapped up summa is also capable of Aquinas' level of abstraction:

For some said that it possessed some common nature, all things are distinguished from all other bodies must intervene between the distinction of these belong to God, in the order of Divine Providence, were the immediate principle of motion. For what is accidental to the various souls being distinguished accordingly as the formal distinction of things.

My favorite question:

Whether the True and False Are Contraries?

And such an elegant ending:

Therefore that thing.

Here's a link to the full text on a simple site. Thanks for this contest, it was fun and I learned a lot!

Arrested Development episode

@jennaplusplus and I worked on an Arrested Development script generator.

The code lives at https://github.com/jennaplusplus/shorties/ and there's a generated episode with 300 lines at https://github.com/jennaplusplus/shorties/blob/master/episode_2.txt

Transcripts for the first three seasons of the show were obtained from http://arresteddevelopment.wikia.com/ with retrieve-transcripts.rb. Trigrams were generated for all characters with more than 10 lines overall, for a total of 71 trigram files (generate-trigrams.rb).

The episode script was generated with generate-episode.rb. The order of characters in the script was selected randomly, weighted by the size of the characters' trigram files. The order of the script was further modified to ensure that the Narrator always starts the episode and a character doesn't speak twice in a row. Each line is 1-5 sentences long.

WikiHole, the short story

So I get stuck in "wikiholes" pretty frequently, you know, where you just want to look up one thing real quick but somehow you keep clicking on links and then all the sudden it's hours later and you have no idea how you got to the article you're on or even what you were first reading about.

My short story is based on this experience! I made it in Ruby. Here is the code & full text.

My story starts with:

You know what I should look up?

Jock Jams

Jock Jams is a series of compilation albums released by Tommy Boy Records.

Then by randomly selecting links from that page, ends up traveling to the Wikipedia article on ISBN numbers:

This looks good: International Standard Book Number

The International Standard Book Number (ISBN) is a unique[a][b] numeric commercial book identifier.

Hmm, intriguing.

And finally ends on the article for square dancing:

Never heard of Traditional square dance

Traditional square dance is a generic American term for any style of American square dance other than modern Western.

This short story code is fun because it is different every time you run it. When you run the script, it prompts you for a url to a wikipedia article, so you can choose the starting point for your story story.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.