Comments (8)
- Yes, you can manually change the text file whenever you want.
Learning works by reading the tweets of the parent account and adding them to the corpus text file. Learning will happen whenever the command line tool is run, assuming that a parent account has been set in the config file. Learning can be disabled with the --no-learn
option.
from twitter_markov.
@fitnr
Thanks!
a couple of questions about learning then
How deep does it "go" into parent account every time command is run?
Like, does it grab the last 10 tweets? Last 20?
Also, I reckon specifying the bot's own account as "parent" would cause it to suck its own tweets into the corpus, resulting in slow degradation of quality (with possible increase of fun), would that be correct?
from twitter_markov.
- It looks back since the last time the markov account tweeted. The assumption is that the learning step happens every time the markov tweets.
- The part about the quality is conceptually correct, except the way it's set up the learning would never read anything. Another tool than can regularly read the tweets from the markov account and append them to the corpus file would work. Maybe check out
twurl
. You could do a daily cron task liketwurl '/1.1/statuses/user_timeline.json?count=N' | jq -r '.[].text' >> corpus.txt
, whereN
is the number of tweets the markov account makes a day, andjq
is a command line json parser.
from twitter_markov.
@fitnr
Thanks!
Two more questions (hopefully not very dumb :) )
When a config has >1 corpus specified, does it (from the point of view of the markov chain formation process) merge them into a single one, as if it was one file?
Does the order of lines in the file practically affect the way the bot treats them (as in, would changing the order in which lines are in the corpus(es) affect probability of it coming up with a particular phrase, given same state-size?)
from twitter_markov.
- If
corpus
is a list, the bot will read from all of the files listed. - Markov chains randomly recombine text, so the order of lines in a corpus should be immaterial. For questions about the specifics of the Markov implementation used here, see Markovify.
from twitter_markov.
@Reapette correction: You can specify multiple texts, which will create multiple models, and you can choose to create texts from any of them (perhaps randomly). To create a combined corpus, just create a file that combines all the texts.
from twitter_markov.
@fitnr
Thanks for explaining.
So, specifying two texts instead of one will not cause it to create one model for one big corpus that is "text1.txt + text2.txt", but will create two models for two separate corpuses?
If that's so, having a "treat all text files as one giant corpus, create one big model" mode would be a great enhancement.
It would allow to create a bot that has one huge "main" corpus that is being updated incrementally via adding stuff to a small separate file, which is IMHO more manageable than adding stuff to an already huge file
from twitter_markov.
I'm not sure if keeping track of many text files is clearly easier than one text file. And since there are many, many ways to create one file on the fly, it doesn't seem like a pressing need. Why not just have a daily or hourly cron job that cat
s all the source files into a mega-corpus.txt
? You can read from that with --no-learn
, then run the script to learn but not tweet into the smaller file.
One change that I've already added to HEAD is to allow the bot to read from any file-like object, so using the Python would allow you to write a script that reads from an arbitrary source.
from twitter_markov.
Related Issues (14)
- pip install twitter_markov errors out HOT 3
- KeyError: 'text' HOT 3
- writelines() argument must be a sequence of strings HOT 7
- Reply To One HOT 3
- Ending reply chains HOT 5
- Issue with non-standard characters? HOT 4
- Limiting length
- Question regarding corpus size HOT 1
- Script doesn't recognize bots.yaml HOT 1
- After upgrading to ubuntu 16.04, no longer recognizes bots.yaml HOT 4
- Odd behavior: Verbose mode seems to indicate that state is always set to 2 no matter what config says HOT 4
- Corpus Creation Errors HOT 5
- Conflicts between twitter_markov and pyyaml HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twitter_markov.