Hi! Two questions: when I want to manually upda

Yes, you can manually change the text file whenever you want. <p dir=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It looks back since the last time the markov account tweeted. The assumption is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

If corpus is a list, the bot will read from all

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Corpus updates and learning [questions] about twitter_markov HOT 8 CLOSED

Reapette commented on September 28, 2024

Corpus updates and learning [questions]

from twitter_markov.

Comments (8)

fitnr commented on September 28, 2024

Yes, you can manually change the text file whenever you want.

Learning works by reading the tweets of the parent account and adding them to the corpus text file. Learning will happen whenever the command line tool is run, assuming that a parent account has been set in the config file. Learning can be disabled with the --no-learn option.

from twitter_markov.

Reapette commented on September 28, 2024

@fitnr
Thanks!
a couple of questions about learning then

How deep does it "go" into parent account every time command is run?
Like, does it grab the last 10 tweets? Last 20?

Also, I reckon specifying the bot's own account as "parent" would cause it to suck its own tweets into the corpus, resulting in slow degradation of quality (with possible increase of fun), would that be correct?

from twitter_markov.

fitnr commented on September 28, 2024

It looks back since the last time the markov account tweeted. The assumption is that the learning step happens every time the markov tweets.
The part about the quality is conceptually correct, except the way it's set up the learning would never read anything. Another tool than can regularly read the tweets from the markov account and append them to the corpus file would work. Maybe check out twurl. You could do a daily cron task like twurl '/1.1/statuses/user_timeline.json?count=N' | jq -r '.[].text' >> corpus.txt, where N is the number of tweets the markov account makes a day, and jq is a command line json parser.

from twitter_markov.

Reapette commented on September 28, 2024

@fitnr
Thanks!

Two more questions (hopefully not very dumb :) )

When a config has >1 corpus specified, does it (from the point of view of the markov chain formation process) merge them into a single one, as if it was one file?

Does the order of lines in the file practically affect the way the bot treats them (as in, would changing the order in which lines are in the corpus(es) affect probability of it coming up with a particular phrase, given same state-size?)

from twitter_markov.

fitnr commented on September 28, 2024

If corpus is a list, the bot will read from all of the files listed.
Markov chains randomly recombine text, so the order of lines in a corpus should be immaterial. For questions about the specifics of the Markov implementation used here, see Markovify.

from twitter_markov.

fitnr commented on September 28, 2024

@Reapette correction: You can specify multiple texts, which will create multiple models, and you can choose to create texts from any of them (perhaps randomly). To create a combined corpus, just create a file that combines all the texts.

from twitter_markov.

Reapette commented on September 28, 2024

@fitnr
Thanks for explaining.

So, specifying two texts instead of one will not cause it to create one model for one big corpus that is "text1.txt + text2.txt", but will create two models for two separate corpuses?

If that's so, having a "treat all text files as one giant corpus, create one big model" mode would be a great enhancement.

It would allow to create a bot that has one huge "main" corpus that is being updated incrementally via adding stuff to a small separate file, which is IMHO more manageable than adding stuff to an already huge file

from twitter_markov.

fitnr commented on September 28, 2024

I'm not sure if keeping track of many text files is clearly easier than one text file. And since there are many, many ways to create one file on the fly, it doesn't seem like a pressing need. Why not just have a daily or hourly cron job that cats all the source files into a mega-corpus.txt? You can read from that with --no-learn, then run the script to learn but not tweet into the smaller file.

One change that I've already added to HEAD is to allow the bot to read from any file-like object, so using the Python would allow you to write a script that reads from an arbitrary source.

from twitter_markov.

Corpus updates and learning [questions] about twitter_markov HOT 8 CLOSED

Comments (8)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent