Giter Site home page Giter Site logo

nschaetti / pytweetbot Goto Github PK

View Code? Open in Web Editor NEW
101.0 8.0 31.0 815 KB

A Twitter bot written in Python to replace yourself, search and publish news about specific subjects on Twitter. PyTweetBot use Machine Learning to filter interesting articles and web pages found on the web.

License: GNU General Public License v3.0

Python 99.77% HTML 0.23%
twitter twitter-bot twitter-api python

pytweetbot's Introduction


A Twitter bot and library written in Python to replace yourself, search and publish news about specific subjects on Twitter, and automatize content publishing.

Tweet

Join our community to create datasets and deep-learning models! Chat with us on Gitter and join the Google Group to collaborate with us.

PyPI version Documentation Status

This repository consists of:

  • pytweetbot.config : Configuration file management;
  • pytweetbot.db : MySQL database management;
  • pytweetbot.directmessages : Twitter direct message functions;
  • pytweetbot.docs : Documentation;
  • pytweetbot.executor : Function and objects to execute actions;
  • pytweetbot.friends : Function and objects to manage friends and followers;
  • pytweetbot.learning : Machine learning functions;
  • pytweetbot.mail : Mail functions;
  • pytweetbot.news : Manage news acquisition and sources;
  • pytweetbot.patterns : Python class patterns;
  • pytweetbot.retweet : Manage retweets and sources;
  • pytweetbot.stats : Statistics;
  • pytweetbot.templates : HTML templates for mail;
  • pytweetbot.tools : Tools;
  • pytweetbot.tweet : Manage tweets;
  • pytweetbot.twitter : Manage access to Twitter;

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

You need to following package to install pyTweetBot.

  • nltk
  • argparse
  • logging
  • tweepy
  • sklearn
  • pygithub
  • brotli
  • httplib2
  • urlparse2
  • HTMLParser
  • bs4
  • simplejson
  • dnspython
  • dill
  • lxml
  • sqlalchemy
  • feedparser
  • textblob
  • numpy
  • scipy
  • mysql-python

Installation

pip install pyTweetBot

Authors

License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

Configuration

Configuration file

pyTweetBot takes its configuration in a JSON file which looks as follow :

{
	"database" :
	{
		"host" : "",
		"username" : "",
		"password" : "",
		"database" : ""
	},
	"email" : "[email protected]",
	"scheduler" :
	{
		"sleep": [6, 13]
	},
	"hashtags":
	[
	],
	"twitter" :
	{
		"auth_token1" : "",
		"auth_token2" : "",
		"access_token1" : "",
		"access_token2" : "",
		"user" : ""
	},
	"friends" :
	{
		"max_new_followers" : 40,
		"max_new_unfollow" : 40,
		"interval" : [15, 60],
		"unfollow_interval" : 604800
	},
	"forbidden_words" :
	[
	],
	"direct_message" : "",
	"tweet" : {
		"max_tweets" : 1800,
		"exclude" : [],
		"interval" : [4.0, 6.0],
		"intervals" : [
			{
				"day": 5,
				"start": 17,
				"end": 23,
				"interval" : [1.0, 3.0]
			}
		]
	},
	"news" :
	[
		{
			"keyword" : "",
			"countries" : ["us","fr"],
			"languages" : ["en","fr"],
			"hashtags" : []
		}
	],
	"rss" :
	[
		{"url" : "http://feeds.feedburner.com/TechCrunch/startups", "hashtags" : "#startups", "lang": ["en"]},
		{"url" : "http://feeds.feedburner.com/TechCrunch/fundings-exits", "hashtags" : "#fundings", "lang": ["en"]}
	],
	"retweet" :
	{
		"max_retweets" : 600,
		"max_likes" : 0,
		"keywords" : [],
		"nbpages" : 40,
		"retweet_prob" : 0.5,
		"limit_prob" : 1.0,
		"interval" : [2.0, 4.0]
	},
	"github" :
	{
		"login": "",
		"password": "",
		"exclude": [],
		"topics" : []
	}
}

Their is two required sections :

  • Database : contains the information to connect to the MySQL database (host, username, password, database)
  • Twitter : contains the information for the Twitter API (auth and access tokens)

Database configuration

The database part of the configuration file looks like the following

"database" :
{
    "host" : "",
    "username" : "",
    "password" : "",
    "database" : ""
}

This section is mandatory.

Update e-mail configuration

You can configure your bot to send you an email with the number of new followers in the email section

"email" : "[email protected]"

Scheduler configuration

The scheduler is responsible for executing the bot's actions and you can configure it the sleep for a specific period of time.

"scheduler" :
{
    "sleep": [6, 13]
}

Here the scheduler will sleep during 6h00 and 13h00.

Hashtags

You can add text to be replace as hashtags in your tweet in the "hashtags" section

"hashtags":
[
    {"from" : "My Hashtag", "to" : "#MyHashtag", "case_sensitive" : true}
]

Here, occurences of "My Hashtag" will be replaced by #MyHashtag.

Twitter

To access Twitter, pyTweetBot needs four tokens for the Twitter API and your username.

"twitter" :
{
    "auth_token1" : "",
    "auth_token2" : "",
    "access_token1" : "",
    "access_token2" : "",
    "user" : ""
}

TODO: tutorial to get the tokens

Friends settings

The friends section has four parameters.

"friends" :
{
	"max_new_followers" : 40,
	"max_new_unfollow" : 40,
	"interval" : [15, 60],
	"unfollow_interval" : 604800
}
  • The max_new_followers set the maximum user that can be followed each day;
  • The max_new_unfollow set the maximum user that can be unfollowed each day;
  • The interval parameter set the interval in minutes between each follow/unfollow action choosen randomly between the min and the max;

Create database

You have then to create the database on your MySQL host

python -m pyTweetBot tools
    --create-database : Create the database structure on the MySQL host
    --export-database : Export tweets, tweeted and followers/friends to a file
    --import-database     Import tweets, tweeted and followers/friends from a file
    --file : File to import / to export to

You can use the "create-database" action for that :

python -m pyTweetBot tools --config /path/to/config.json --create

It is possible to export bot's data to a file with the export-database command.

python -m pyTweetBot tools --config /path/to/config.json --export --file export_file.p

And then import the bot's data from the file

python -m pyTweetBot tools --config /path/to/config.json --import --file export_file.p

Model training

Create a dataset

The first step to train a model is to create a dataset of positive and negative examples. This can be done with the train command and the "dataset" action.

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --text-size 100 --action dataset --source news

The source argument can take the following value :

  • News : URLs from Google News and and RSS streams;
  • tweets : Tweets found directly on Twitter;
  • friends : Description of Twitter users found directly on Twitter;
  • followers : Description of Twitter users found in your list of followers;
  • home : Tweets found on our home feed;

Train a model

Once the dataset is created, we can train a model using the "train" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action train --text-size 100 --classifier SVM
INFO:pyTweetBot:Finalizing training...
INFO:pyTweetBot:Training finished... Saving model to mymodel.p

The classifier parameter can take the following values :

  • NaiveBayes : Naive Bayes classifier;
  • DecisionTree : Simple decision tree;
  • RandomForest : Random forest;
  • SVM : Support Vector Machine;

Test a model

You can test your model's accuracy with the "test" action :

python -m pyTweetBot train --dataset test.p --config ../nils-config/nilsbot.json --model mymodel.p --action test --text-size 100
Success rate of 56.1108362197 on dataset

You can now use your model to class tweets.

Command line

Launch executors

pyTweetBot launch an executor thread for each action type. You can launch the executor daemon that way :

python -m pyTweetBot executor --config /etc/bots/bot.conf

Find new tweets

python -m pyTweetBot find-tweets --config /etc/bots/bot.conf --model /etc/bots/models/find_tweets.p

Find new retweets

python -m pyTweetBot find-retweets --config /etc/bots/bot.conf --model /etc/bots/moedls/find_retweets.p

Automatise execution with crontab

Development

Files

pytweetbot's People

Contributors

metmajer avatar nschaetti avatar tlwt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytweetbot's Issues

Documentation for pyTweetBot

The software looks very interesting. I managed to get things up and running Docker.

Can you provide a brief intro on how to use the tooling / extend the readme a bit?

Possible to consider tweets only from followings?

Hi @nschaetti!

I've just started training my bot using python -m pyTweetBot train --config /data/config.json --dataset /data/dataset-tweets.p --text-size 100 --action dataset --source tweets. Now, most of the tweets I have to analyze and train the bot with are a clear 'no' (lots of meaningless conversations in tweet replies). What would work best for me is to train the bot only with tweets from accounts my account follows. Or would that narrow the dataset too much?

Any ideas? Thanks!

Cannot get pyTweetBot to do some action

Hi @nschaetti, first let me congratulate you on this amazing project. I'd really love to use this as a tweet generator for @getcloudn8tv. Unfortunately, I am not really able to get the bot to do something useful. As you can see, @getcloudn8tv is still fresh and thus has only a very limited number of followers and followees.

I would assume that the execute option would be the best entrypoint to get the bot started. However, even after waiting for longer than an hour, the list of actions remains empty.

# python -m pyTweetBot execute --config /data/config.json --log-level 10 --log-file /data/log
INFO:pyTweetBot:Start thread for action type Tweet...
INFO:pyTweetBot:Start thread for action type Retweet...
INFO:pyTweetBot:Start thread for action type Like...
INFO:pyTweetBot:Start thread for action type Follow...
INFO:pyTweetBot:Start thread for action type Unfollow...
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 5.3 minutes for next run of tweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 7.9 minutes for next run of retweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 7.9 minutes for next run of like
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 16.4 minutes for next run of follow
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 36.5 minutes for next run of unfollow


DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 6.5 minutes for next run of tweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 4.4 minutes for next run of like
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 4.1 minutes for next run of retweet

DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 6.2 minutes for next run of tweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 4.2 minutes for next run of retweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 6.8 minutes for next run of like


DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 5.8 minutes for next run of retweet

DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 39.0 minutes for next run of follow
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 9.1 minutes for next run of tweet
DEBUG:pyTweetBot:_get_exec_action : []
INFO:pyTweetBot:Waiting 5.4 minutes for next run of like

Now, I don't get a lot of information from the log output. I would really like to see if the bot's keyword search via Google News or for relevant Tweets does get at least some results or not. While the bot was running, a Twitter user that the bot follows (@metmajer) has retweeted this tweet, which contains a keyword that is contained in the config's retweet.keywords (Kubernetes).

So, my next approach was to use the find-tweets command to see whether that gives more insights, but then again this requires me to define a MODEL and I cannot see where this is created.

# python -m pyTweetBot find-tweets
usage: pyTweetBot find-tweets [-h] --config CONFIG [--log-level LOG_LEVEL]
                              [--log-file LOG_FILE] --model MODEL
                              [--threshold THRESHOLD] [--n-pages N_PAGES]
                              [--text-size TEXT_SIZE]

Unfortunately, the documentation on model training doesn't explain this important step. @nschaetti, @tlwt, any additional help would be most valuable. Cheers!

Unable to install pytweetbot with pip install

Collecting pyTweetBot
Downloading pyTweetBot-0.1.3.tar.gz (64 kB)
|████████████████████████████████| 64 kB 158 kB/s
Collecting nltk
Downloading nltk-3.5.zip (1.4 MB)
|████████████████████████████████| 1.4 MB 437 kB/s
Collecting argparse
Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting logging
Downloading logging-0.4.9.6.tar.gz (96 kB)
|████████████████████████████████| 96 kB 149 kB/s
ERROR: Command errored out with exit status 1:
command: 'c:\users\mushe\appdata\local\programs\python\python38-32\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\mushe\AppData\Local\Temp\pip-install-inc1tcgh\logging\setup.py'"'"'; file='"'"'C:\Users\mushe\AppData\Local\Temp\pip-install-inc1tcgh\logging\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c'
cwd: C:\Users\mushe\AppData\Local\Temp\pip-install-inc1tcgh\logging
Complete output (48 lines):
running egg_info
creating C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c\logging.egg-info
writing C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c\logging.egg-info\PKG-INFO
writing dependency_links to C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c\logging.egg-info\dependency_links.txt
writing top-level names to C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c\logging.egg-info\top_level.txt
writing manifest file 'C:\Users\mushe\AppData\Local\Temp\pip-pip-egg-info-ja6du_8c\logging.egg-info\SOURCES.txt'
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\mushe\AppData\Local\Temp\pip-install-inc1tcgh\logging\setup.py", line 3, in
setup(name = "logging",
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\egg_info.py", line 297, in run
self.find_sources()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\egg_info.py", line 304, in find_sources
mm.run()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\egg_info.py", line 535, in run
self.add_defaults()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\egg_info.py", line 571, in add_defaults
sdist.add_defaults(self)
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\command\sdist.py", line 226, in add_defaults
self.add_defaults_python()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\sdist.py", line 135, in add_defaults_python
build_py = self.get_finalized_command('build_py')
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\distutils\dist.py", line 857, in get_command_obj
klass = self.get_command_class(command)
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\dist.py", line 764, in get_command_class
self.cmdclass[command] = cmdclass = ep.load()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init
.py", line 2462, in load
return self.resolve()
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\pkg_resources_init
.py", line 2468, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\command\build_py.py", line 16, in
from setuptools.lib2to3_ex import Mixin2to3
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\site-packages\setuptools\lib2to3_ex.py", line 13, in
from lib2to3.refactor import RefactoringTool, get_fixers_from_package
File "c:\users\mushe\appdata\local\programs\python\python38-32\lib\lib2to3\refactor.py", line 19, in
import logging
File "C:\Users\mushe\AppData\Local\Temp\pip-install-inc1tcgh\logging\logging_init_.py", line 618
raise NotImplementedError, 'emit must be implemented '
^
SyntaxError: invalid syntax
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

'FriendsManager' object has no attribute 'n_following'

Hi @nschaetti! I am using the latest version from the developing trunk and came across the following issue when calling python -m pyTweetBot friends --config /data/config.json --log-level 10 --update:

$ python -m pyTweetBot friends --config /data/config.json --log-level 10 --update
INFO:pyTweetBot:Updating followers...
[...]
INFO:pyTweetBot:Updating followings...
[...]
INFO:pyTweetBot:New following CloudNativeFdn
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/app/pyTweetBot/__main__.py", line 406, in <module>
    friends_manager.update()
  File "pyTweetBot/friends/FriendsManager.py", line 280, in update
    self.update_statistics()
  File "pyTweetBot/friends/FriendsManager.py", line 301, in update_statistics
    statistic = pyTweetBot.db.obj.Statistic(statistic_friends_count=self.n_following(),
AttributeError: 'FriendsManager' object has no attribute 'n_following'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.