Giter Site home page Giter Site logo

Comments (8)

fitnr avatar fitnr commented on September 28, 2024
  1. Yes, you can manually change the text file whenever you want.

Learning works by reading the tweets of the parent account and adding them to the corpus text file. Learning will happen whenever the command line tool is run, assuming that a parent account has been set in the config file. Learning can be disabled with the --no-learn option.

from twitter_markov.

Reapette avatar Reapette commented on September 28, 2024

@fitnr
Thanks!
a couple of questions about learning then

How deep does it "go" into parent account every time command is run?
Like, does it grab the last 10 tweets? Last 20?

Also, I reckon specifying the bot's own account as "parent" would cause it to suck its own tweets into the corpus, resulting in slow degradation of quality (with possible increase of fun), would that be correct?

from twitter_markov.

fitnr avatar fitnr commented on September 28, 2024
  1. It looks back since the last time the markov account tweeted. The assumption is that the learning step happens every time the markov tweets.
  2. The part about the quality is conceptually correct, except the way it's set up the learning would never read anything. Another tool than can regularly read the tweets from the markov account and append them to the corpus file would work. Maybe check out twurl. You could do a daily cron task like twurl '/1.1/statuses/user_timeline.json?count=N' | jq -r '.[].text' >> corpus.txt, where N is the number of tweets the markov account makes a day, and jq is a command line json parser.

from twitter_markov.

Reapette avatar Reapette commented on September 28, 2024

@fitnr
Thanks!

Two more questions (hopefully not very dumb :) )

When a config has >1 corpus specified, does it (from the point of view of the markov chain formation process) merge them into a single one, as if it was one file?

Does the order of lines in the file practically affect the way the bot treats them (as in, would changing the order in which lines are in the corpus(es) affect probability of it coming up with a particular phrase, given same state-size?)

from twitter_markov.

fitnr avatar fitnr commented on September 28, 2024
  1. If corpus is a list, the bot will read from all of the files listed.
  2. Markov chains randomly recombine text, so the order of lines in a corpus should be immaterial. For questions about the specifics of the Markov implementation used here, see Markovify.

from twitter_markov.

fitnr avatar fitnr commented on September 28, 2024

@Reapette correction: You can specify multiple texts, which will create multiple models, and you can choose to create texts from any of them (perhaps randomly). To create a combined corpus, just create a file that combines all the texts.

from twitter_markov.

Reapette avatar Reapette commented on September 28, 2024

@fitnr
Thanks for explaining.

So, specifying two texts instead of one will not cause it to create one model for one big corpus that is "text1.txt + text2.txt", but will create two models for two separate corpuses?

If that's so, having a "treat all text files as one giant corpus, create one big model" mode would be a great enhancement.

It would allow to create a bot that has one huge "main" corpus that is being updated incrementally via adding stuff to a small separate file, which is IMHO more manageable than adding stuff to an already huge file

from twitter_markov.

fitnr avatar fitnr commented on September 28, 2024

I'm not sure if keeping track of many text files is clearly easier than one text file. And since there are many, many ways to create one file on the fly, it doesn't seem like a pressing need. Why not just have a daily or hourly cron job that cats all the source files into a mega-corpus.txt? You can read from that with --no-learn, then run the script to learn but not tweet into the smaller file.

One change that I've already added to HEAD is to allow the bot to read from any file-like object, so using the Python would allow you to write a script that reads from an arbitrary source.

from twitter_markov.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.