tehmillhouse / pymarkovchain Goto Github PK
View Code? Open in Web Editor NEWSimple markov chain implementation in python
License: Other
Simple markov chain implementation in python
License: Other
I used a book from the Gutenberg Project as input file. A line in the input file corresponds to a paragraph in the book, and as PyMarkovChain uses lines as the delimiting unit, when calling generateString, I either get really long text or a RuntimeError, as recursion depth has been exceeded in _accumulateWithSeed:
>>> print a.generateString()
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "MarkovChain.py", line 76, in generateString
return self._accumulateWithSeed("", "")
File "MarkovChain.py", line 103, in _accumulateWithSeed
return self._accumulateWithSeed(sentence + sep + lastWord, nextWord)
[...]
File "MarkovChain.py", line 103, in _accumulateWithSeed
return self._accumulateWithSeed(sentence + sep + lastWord, nextWord)
File "MarkovChain.py", line 96, in _accumulateWithSeed
nextWord = self._nextWord(lastWord)
File "MarkovChain.py", line 114, in _nextWord
if probmap[candidate] > maxprob:
RuntimeError: maximum recursion depth exceeded in cmp
Programming in a recursive style is a bad idea if you can't rely on tail call optimization to be there.
For better performance
This would require the markov chain to defer calculation of the occurrence probability until during text generation, but should be quite doable.
Also, switching the _nextWord
function over to doing integer math will do away with rounding errors and will improve performance. Yay!
What can PyMarkovChain do?
It seems to me that this project is not continued. What do you think about removing the package from PyPI?
When I run your example code:
from pymarkovchain import MarkovChain
MarkovChain().generateDatabase("This is some language to analyze")
MarkovChain().generateString()
I get the following error:
Database file not found, using empty database
Database file could not be writtenDatabase file not found, using empty database
Traceback (most recent call last):
File "markov.py", line 8, in <module>
print MarkovChain().generateString()
File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 98, in generateString
return self._accumulateWithSeed('')
File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 122, in _accumulateWithSeed
nextWord = self._nextWord(seed)
File "/Library/Python/2.7/site-packages/pymarkovchain/MarkovChain.py", line 130, in _nextWord
probmap = self.db[lastword]
KeyError: ''
I've got source text that has some word variations that I'd like to ignore. There are the basic case sensitivity issues. If it were just that I could str.lower
everything, but I'm also dealing with source text with a lot of misspellings. I think I might be able to get around this by using some ratio function from difflib
. It would be useful to be able to supply a function which would be used to compare two strings and if it returns True, consider them to be equal.
For full compatability between python2 and python3, use python3 style "True division"
Just add
from __future__ import division
if you want it to work in Python versions older than 2.2, then you can put a try-except block around this.
You talked about rounding errors in other bugs. Adding this will (probably) solve them.
In most cases, one might want to know if the markov chain didn't manage to accumulate a valid sentence based on a single word seed, so silently ignoring that isn't good practice.
Hi,
I need wheel file for this package.When can i find it please.
Thanks & Regards,
Siva
Is the current master of PyMarkovChain stable enough to add to pypi? I found the argument n
useful in the MarkovChain initialization. Thanks!
If I understand the code for this correctly, this is just a first order Markov chain. So for instance, if you have a sentence, "The sky is blue", it maps "The" -> "sky", "sky" -> "is", and "is" -> "blue". A higher order one would also consider "The sky" -> "is", "sky is" -> "blue", and so on. A keyword argument to specify the order would probably be best, as there are probably disadvantages of having orders be too high.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.