tehmillhouse / pymarkovchain Goto Github PK

View Code? Open in Web Editor NEW

97.0 97.0 17.0 362 KB

Simple markov chain implementation in python

License: Other

Python 100.00%

pymarkovchain's People

Contributors

Stargazers

Watchers

Forkers

tswicegood crccheck asmeurer dunkelstern hyperobject punkrockpolly silky alexruimy zachwick pjb3005 sunnyerteit nemocpp jfhr jieruyu thebinarybot nitronpower ultrageek

pymarkovchain's Issues

If the input data has extremely long lines, generateString can cause a RuntimeError

I used a book from the Gutenberg Project as input file. A line in the input file corresponds to a paragraph in the book, and as PyMarkovChain uses lines as the delimiting unit, when calling generateString, I either get really long text or a RuntimeError, as recursion depth has been exceeded in _accumulateWithSeed:

>>> print a.generateString()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "MarkovChain.py", line 76, in generateString
    return self._accumulateWithSeed("", "")
  File "MarkovChain.py", line 103, in _accumulateWithSeed
    return self._accumulateWithSeed(sentence + sep + lastWord, nextWord)
[...]
  File "MarkovChain.py", line 103, in _accumulateWithSeed
    return self._accumulateWithSeed(sentence + sep + lastWord, nextWord)
  File "MarkovChain.py", line 96, in _accumulateWithSeed
    nextWord = self._nextWord(lastWord)
  File "MarkovChain.py", line 114, in _nextWord
    if probmap[candidate] > maxprob:
RuntimeError: maximum recursion depth exceeded in cmp

Sentence generation shouldn't be recursive

Programming in a recursive style is a bad idea if you can't rely on tail call optimization to be there.

use cPickle instead of pickle

For better performance

http://docs.python.org/2/library/pickle.html#module-cPickle

Ability to add text samples to existing database

This would require the markov chain to defer calculation of the occurrence probability until during text generation, but should be quite doable.

Also, switching the _nextWord function over to doing integer math will do away with rounding errors and will improve performance. Yay!

What is it good for?

What can PyMarkovChain do?

It seems to me that this project is not continued. What do you think about removing the package from PyPI?

Example not working

When I run your example code:

from pymarkovchain import MarkovChain
MarkovChain().generateDatabase("This is some language to analyze")
MarkovChain().generateString()

I get the following error:

Database file not found, using empty database

Database file could not be writtenDatabase file not found, using empty database

Traceback (most recent call last):
  File "markov.py", line 8, in <module>
    print MarkovChain().generateString()
  File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 98, in generateString
    return self._accumulateWithSeed('')
  File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 122, in _accumulateWithSeed
    nextWord = self._nextWord(seed)
  File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 130, in _nextWord
    probmap = self.db[lastword]
KeyError: ''

Add a way to make words compare equal

I've got source text that has some word variations that I'd like to ignore. There are the basic case sensitivity issues. If it were just that I could str.lower everything, but I'm also dealing with source text with a lot of misspellings. I think I might be able to get around this by using some ratio function from difflib. It would be useful to be able to supply a function which would be used to compare two strings and if it returns True, consider them to be equal.

For full compatability between python2 and python3, use python3 style "True division"

For full compatability between python2 and python3, use python3 style "True division"
Just add

from __future__ import division

if you want it to work in Python versions older than 2.2, then you can put a try-except block around this.

You talked about rounding errors in other bugs. Adding this will (probably) solve them.

Stop markov chain silently failing to complete words

In most cases, one might want to know if the markov chain didn't manage to accumulate a valid sentence based on a single word seed, so silently ignoring that isn't good practice.

wheel file

Hi,

I need wheel file for this package.When can i find it please.

Thanks & Regards,
Siva

pypi behind the times

Is the current master of PyMarkovChain stable enough to add to pypi? I found the argument n useful in the MarkovChain initialization. Thanks!

Add ability to do an nth order Markov chain

If I understand the code for this correctly, this is just a first order Markov chain. So for instance, if you have a sentence, "The sky is blue", it maps "The" -> "sky", "sky" -> "is", and "is" -> "blue". A higher order one would also consider "The sky" -> "is", "sky is" -> "blue", and so on. A keyword argument to specify the order would probably be best, as there are probably disadvantages of having orders be too high.