Giter Site home page Giter Site logo

twintproject / twint Goto Github PK

View Code? Open in Web Editor NEW
15.6K 323.0 2.7K 4.47 MB

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

License: MIT License

Python 99.82% Dockerfile 0.18%
osint twitter python scrape tweets elasticsearch kibana scrape-followers scrape-likes scrape-following

twint's Introduction

TWINT - Twitter Intelligence Tool

2 3

PyPI Build Status Python 3.6|3.7|3.8 GitHub license Downloads Downloads Patreon

No authentication. No API. No limits.

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.

Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too.

Twint also makes special queries to Twitter allowing you to also scrape a Twitter user's followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.

tl;dr Benefits

Some of the benefits of using Twint vs Twitter API:

  • Can fetch almost all Tweets (Twitter API limits to last 3200 Tweets only);
  • Fast initial setup;
  • Can be used anonymously and without Twitter sign up;
  • No rate limitations.

Limits imposed by Twitter

Twitter limits scrolls while browsing the user timeline. This means that with .Profile or with .Favorites you will be able to get ~3200 tweets.

Requirements

  • Python 3.6;
  • aiohttp;
  • aiodns;
  • beautifulsoup4;
  • cchardet;
  • dataclasses
  • elasticsearch;
  • pysocks;
  • pandas (>=0.23.0);
  • aiohttp_socks;
  • schedule;
  • geopy;
  • fake-useragent;
  • py-googletransx.

Installing

Git:

git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Pip:

pip3 install twint

or

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Pipenv:

pipenv install git+https://github.com/twintproject/twint.git#egg=twint

March 2, 2021 Update

Added: Dockerfile

Noticed a lot of people are having issues installing (including me). Please use the Dockerfile temporarily while I look into them.

CLI Basic Examples and Combos

A few simple examples to help you understand the basics:

  • twint -u username - Scrape all the Tweets of a user (doesn't include retweets but includes replies).
  • twint -u username -s pineapple - Scrape all Tweets from the user's timeline containing pineapple.
  • twint -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets.
  • twint -u username --year 2014 - Collect Tweets that were tweeted before 2014.
  • twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15.
  • twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.
  • twint -u username -o file.txt - Scrape Tweets and save to file.txt.
  • twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file.
  • twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses.
  • twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump.
  • twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
  • twint -u username -es localhost:9200 - Output Tweets to Elasticsearch
  • twint -u username -o file.json --json - Scrape Tweets and save as a json file.
  • twint -u username --database tweets.db - Save Tweets to a SQLite database.
  • twint -u username --followers - Scrape a Twitter user's followers.
  • twint -u username --following - Scrape who a Twitter user follows.
  • twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
  • twint -u username --following --user-full - Collect full user information a person follows
  • twint -u username --timeline - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including retweets & replies).
  • twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile.
  • twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.

More detail about the commands and options are located in the wiki

Module Example

Twint can now be used as a module and supports custom formatting. More details are located in the wiki

import twint

# Configure
c = twint.Config()
c.Username = "realDonaldTrump"
c.Search = "great"

# Run
twint.run.Search(c)

Output

955511208597184512 2018-01-22 18:43:19 GMT <now> pineapples are the best fruit

import twint

c = twint.Config()

c.Username = "noneprivacy"
c.Custom["tweet"] = ["id"]
c.Custom["user"] = ["bio"]
c.Limit = 10
c.Store_csv = True
c.Output = "none"

twint.run.Search(c)

Storing Options

  • Write to file;
  • CSV;
  • JSON;
  • SQLite;
  • Elasticsearch.

Elasticsearch Setup

Details on setting up Elasticsearch with Twint is located in the wiki.

Graph Visualization

graph

Graph details are also located in the wiki.

We are developing a Twint Desktop App.

4

FAQ

I tried scraping tweets from a user, I know that they exist but I'm not getting them

Twitter can shadow-ban accounts, which means that their tweets will not be available via search. To solve this, pass --profile-full if you are using Twint via CLI or, if are using Twint as module, add config.Profile_full = True. Please note that this process will be quite slow.

More Examples

Followers/Following

To get only follower usernames/following usernames

twint -u username --followers

twint -u username --following

To get user info of followers/following users

twint -u username --followers --user-full

twint -u username --following --user-full

userlist

To get only user info of user

twint -u username --user-full

To get user info of users from a userlist

twint --userlist inputlist --user-full

tweet translation (experimental)

To get 100 english tweets and translate them to italian

twint -u noneprivacy --csv --output none.csv --lang en --translate --translate-dest it --limit 100

or

import twint

c = twint.Config()
c.Username = "noneprivacy"
c.Limit = 100
c.Store_csv = True
c.Output = "none.csv"
c.Lang = "en"
c.Translate = True
c.TranslateDest = "it"
twint.run.Search(c)

Notes:

Featured Blog Posts:

Contact

If you have any question, want to join in discussions, or need extra help, you are welcome to join our Twint focused channel at OSINT team

twint's People

Contributors

ajctrl avatar aldou avatar benjaminvanrenterghem avatar calebglawson avatar cbjrobertson avatar coreyryanhanson avatar d1rtydann avatar decapstrike avatar flemish4 avatar fshabashev avatar haccer avatar himanshudabas avatar hpiedcoq avatar im-n1 avatar jamim avatar leiz avatar llunn avatar lmeyerov avatar mehdzor avatar neon-ninja avatar nestor75 avatar nosuck avatar philuchansky avatar pielco11 avatar prabod avatar rlkelly avatar rtim0 avatar sunnya97 avatar wardnath avatar yunusemrecatalcam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twint's Issues

Some questions regarding CSV

Question:

  1. Are pipe character | properly escaped when dumping output to CSV (for compatibility)?
  2. Why are Emojis represented as <Emoji: blah> instead of proper Unicode?
  3. Can some of the link abbreviations be unpacked to the original form, or put on a seperate column?

Append to csv/file on update

Is there a way to instead of running the script and it downloading and saving output to a file each time, we can just append the latest tweets into the file? Some way of comparing what we have versus what we're missing?

Thanks

Extract user tweets by userID instead of name/handle

If I have a list of twitter users I have to scrape, but the list is in ID instead of handles, is it possible to use a converter like https://tweeterid.com/ to do the conversion?
Twitter has an official solution, that is scraping a redirect from https://twitter.com/intent/user?user_id=${number}, however there is no shortcut for the search bar to do it.

Problem with -o --cvs in the new Twint version released today

Initial Check

Make sure you've checked the following.

  • [] Python version is 3.5 or higher.
  • [] Using the latest version of Twint.

If this is a feature request please specify in the title:

Example:

[REQUEST] More features!

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Description of Issue

Please use as much detail as possible.

OS Details

Using Windows, Linux? What OS version?

problem with -o parameter in the new version released today does not work

Initial Check

Make sure you've checked the following.

  • Python version is 3.5 or higher.
  • Using the latest version of Twint.

Command

python twint.py -u nestorpomar --output file.csv --csv
python twint.py -u nestorpomar -o file.csv --csv

Description of Issue

with the previous TWINT version I don't have any problem in writting CSV files
I just installed the new version released today and when I use -o or --output , the execution gives the following error message:

Traceback (most recent call last):
File "twint.py", line 104, in
main()
File "twint.py", line 64, in main
if args.csv and args.o is None:
AttributeError: 'Namespace' object has no attribute 'o'

If I try with the previous version it works without probleam

OS Details

ubuntu 16

Does not work with some twitter accounts

Twitter accounts with This profile may include potentially sensitive contenton the front,
where you have to click into it to access the content, will break tweep.
It will have a button that look like
<button class="EdgeButton EdgeButton--tertiary ProfileWarningTimeline-button">Yes, view profile</button>

Tweep stop scraping after about 15000/20000 tweets back

Command

python3 tweep.py -s eurusd

Description of Issue

I want to ask why if I start scraping for a keyword after about 10 days back the scraping process ends.
I would like to scrape at least 1000 days back
The number of scraped tweets change each time I use "python3 tweep.py -s eurusd"

OS Details

Debian 9

[Error] Failing before limit

I've run Twint successfully a lot with no issues (thanks! it's great!), but starting last night, a number of of attempts have run into problems. I'm wondering what you make of these errors. I do have Python 3.6.5 and the latest version of Twint. I'm running on Mac OS High Sierra 10.13.4.

python3 twint.py -u NickKristof -o NickKristof_m_lib_twint.csv --csv
(runs for a few hundred tweets and then...)

Traceback (most recent call last):
File "twint.py", line 193, in fetch
async with session.get(url) as response:
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 783, in aenter
self._resp = await self._coro
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 333, in _request
await resp.start(conn, read_until_eof)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 695, in start
(message, payload) = await self._protocol.read()
File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 533, in read
await self._waiter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "twint.py", line 771, in
loop.run_until_complete(main())
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
return future.result()
File "twint.py", line 689, in main
feed, init, count = await getTweets(init)
File "twint.py", line 510, in getTweets
tweets, init = await getFeed(init)
File "twint.py", line 291, in getFeed
response = await fetch(session, await getUrl(init))
File "twint.py", line 194, in fetch
return await response.text()
File "/usr/local/lib/python3.6/site-packages/async_timeout/init.py", line 38, in exit
self._do_exit(exc_type)
File "/usr/local/lib/python3.6/site-packages/async_timeout/init.py", line 83, in _do_exit
raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

Doesn't scrap tweets till the specified date

Initial Check

Make sure you've checked the following.

  • Python version is 3.5 or higher.
  • Using the latest version of Tweep.

Command

python tweep.py -s Bitcoin --since 2018-01-01 -0 file.csv --csv

Description of Issue

The module stops scraping tweets beyond a certain time frame, whereas it is expected to scrap all posts since 1st of January as mentioned in the command line args.
Example: If the latest tweet was on 2018-03-16 21:30:57 IST, the module retrieves tweets till about 2018-03-16 06:54:15 IST.

OS Details

OS Name: Microsoft Windows 10 Home Single Language
OS Version: 10.0.16299 N/A Build 16299

getitem error

Previously I was able to use tweet. Now not anymore.
i get this error:

TypeError: 'NoneType' object has no attribute 'getitem'

what next?

lorenzo

Why this gap between tweet posted and tweets downloaded. [no bug]

I don't think this is a bug, but I'm just wondering why is there such a gap between the number of tweets I'm supposed to have posted (+/- 1700) and the tweets I really download ( (+/- 770).
Using the --count option was really disapointing on this point of view. Any idea about that?

friends and followers

Hi,

do you think the code could be easily extended to mine also the followers and friends of a (list of) user?

Thanks

TypeError: 'NoneType'

I've been testing the script with different usernames and languages and it's showing an error in line 78

It retrieves four tweets and then it breaks showing the following error:

Traceback (most recent call last):
File "tweep.py", line 168, in
tweep().main()
File "tweep.py", line 130, in main
self.get_tweets()
File "tweep.py", line 78, in get_tweets
datestamp = tweet.find('a','tweet-timestamp')['title'].rpartition(' - ')[-1]

TypeError: ' NoneType' object has no attribute 'getitem'

[REQUEST] new fields to CSV, JSON and DB. Header in CSV - mod - modified coded included

Initial Check

Make sure you've checked the following.

  • Python version is 3.5 or higher.
  • Using the latest version of Twint. (I realized you made some changes in the latest hours so it's not the latest but yesterday version)

If this is a feature request please specify in the title:

Example:

[REQUEST]
I want to get and store some fields I think your code doesn't do it out of the box.
The fields are:

  • user_id
  • mentions
  • permanent link

a part from that I want to include a header in the CSV output fields with what is it each colum but only if the file does not exist.

I want to share with you the modificatios I did just in case you consider they could be interesting for you and the rest of the people.

I checked for CSV files and JSON but not for storing the date in a database
in bold the file and what I modified / included

file: db.py (it confuse my as the code is commented....)

import os

table_tweets = """
CREATE TABLE IF NOT EXISTS
tweets (
id integer primary key,
date text not null,
time text not null,
timezone text not null,
user text not null,
tweet text not null,
replies integer,
likes integer,
retweets integer,
hashtags text,
**mentions text
userid integer,
permanentlink text

			);**

“””

def tweets(conn, Tweet):
try:
cursor = conn.cursor()
entry = (Tweet.id,
Tweet.datestamp,
Tweet.timestamp,
Tweet.timezone,
Tweet.username,
Tweet.tweet,
Tweet.replies,
Tweet.likes,
Tweet.retweets,
",".join(Tweet.hashtags),
",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink,)

cursor.execute("INSERT INTO tweets VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?)", entry)

conn.commit()
except sqlite3.IntegrityError:
	pass

====================
file tweet.py

class Tweet(object):
id = ""
date = ""
datestamp = ""
time = ""
timestamp = ""
timezone = ""
username = ""
tweet = "" # Text
replies = "0"
likes = "0"
retweets = "0"
hashtags = ""
mentions = ""
userid = ""
permanentlink = ""

==================
output.py

def writeJSON(Tweet, file):
data = {
"id": Tweet.id,
"date": Tweet.datestamp,
"time": Tweet.timestamp,
"timezone": Tweet.timezone,
"username": Tweet.username,
"tweet": Tweet.tweet,
"replies": Tweet.replies,
"retweets": Tweet.retweets,
"likes": Tweet.likes,
"hashtags": ",".join(Tweet.hashtags),
",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink]

with open(file, "a", newline='', encoding="utf-8") as json_file:
json.dump(data, json_file)
json_file.write("\n")

def writeCSV(Tweet, file):
data = [
Tweet.id,
Tweet.datestamp,
Tweet.timestamp,
Tweet.timezone,
Tweet.username,
Tweet.tweet,
Tweet.replies,
Tweet.retweets,
Tweet.likes,
",".join(Tweet.hashtags),
**",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink]
if not (os.path.exists(file)):
with open(file, "a", newline='', encoding="utf-8") as csv_file:
writer = csv.writer(csv_file, delimiter="\t")
writer.writerow(["tweetid", "date", "time", "timezone", "user","text",
"replies","retweets","likes", "hashtags",
"mentions", "user_id", "permanent Link"])
writer.writerow(data)

        else:
                   with open(file, "a", newline='', encoding="utf-8") as csv_file:
                        writer = csv.writer(csv_file, delimiter="\t")
                        writer.writerow(data)**

def getMentions_nes(text):
mentions = re.findall(r'(?i)@\w+', text, flags=re.UNICODE)
return mentions
#return ",".join(mentions)

Sort HTML

def getTweet(tw, config):
t = Tweet()
t.id = tw.find("div")["data-item-id"]
t.date = getDate(tw)
if config.Since and config.Until:
if (t.date.date() - datetime.datetime.strptime(config.Since, "%Y-%m-%d").date()).days == -1:
# mitigation here, maybe find something better
sys.exit(0)
t.datestamp = t.date.strftime("%Y-%m-%d")
t.time = getTime(tw)
t.timestamp = t.time.strftime("%H:%M:%S")
t.username = tw.find("span", "username").text.replace("@", "")
t.timezone = strftime("%Z", gmtime())
for img in tw.findAll("img", "Emoji Emoji--forText"):
img.replaceWith("<{}>".format(img['aria-label']))
t.tweet = getMentions(tw, getText(tw))
t.hashtags = getHashtags(t.tweet)
t.mentions = getMentions_nes(t.tweet)
t.replies = getStat(tw, "reply")
t.retweets = getStat(tw, "retweet")
t.likes = getStat(tw, "favorite")
t.userid = tw.find("div")["data-user-id"]
t.permanentlink = 'https://twitter.com' + tw.find("div")["data-permalink-path"]

return t

===================

OS Details

Using Windows, Linux? Linux
What OS version? Ubuntu 16

thank you very much for your help

Not getting data from a username

I ran twint against a username yesterday and it worked great. Today I'm trying to do the same and the command is just sitting there not doing anything. If I choose another username it seems to work. I'm not seeing anywhere where it will show a log file to maybe try and debug the issue or find out why it isn't working now.

username is @rusthackreport

Cheers

UserID as data column

For ease of retrieval, it would be great if UserID is used for getting tweets (instead of only username).
Same can be said for TweetID, reply/quote TweetID etc.

entry for table 'users' shows none for multiuser entry

when using python3 tweep.py -s 'truedonaldtrump' --database test.db the saved table 'users' in SQLight, contains the entry 'None' instead of the Usernames. So multiuser entries won't work.
If you use python3 tweep.py -u 'truedonaldtrump' --database test.db the entry of 'users' is right.
Bug or feature?

Tweep vs Tweepy

What are some of the disadvantages of using Tweep compaired to APIs that use auth such as Tweepy? The README only lists advantages.

e.g. Will Tweep miss tweets that the Twitter API wouldn't?

Event loop is running exception

Greetings! Would be glad to receive some clarifications

Command

import twint

c = twint.Config()
c.Search = 'sometext'
c.Since = '2018-04-26'
c.Until = '2018-04-26'
result = twint.Search(c)

Description of Issue

Ran this code. Exception happening:

~/Analysis/toolbox/twint/twint/search.py in __init__(self, config)
     35 
     36                 loop = asyncio.get_event_loop()
---> 37                 loop.run_until_complete(self.main())
     38 
     39         async def Feed(self):

/usr/lib/python3.5/asyncio/base_events.py in run_until_complete(self, future)
    373         future.add_done_callback(_run_until_complete_cb)
    374         try:
--> 375             self.run_forever()
    376         except:
    377             if new_task and future.done() and not future.cancelled():

/usr/lib/python3.5/asyncio/base_events.py in run_forever(self)
    338         self._check_closed()
    339         if self.is_running():
--> 340             raise RuntimeError('Event loop is running.')
    341         self._set_coroutine_wrapper(self._debug)
    342         self._thread_id = threading.get_ident()

RuntimeError: Event loop is running.

Afterwards jupyter kernel just dies

OS Details

Ubuntu 16.04, jupyter notebook

Possibility of downloading threads w/replies?

Initial Check

Make sure you've checked the following.

  • [] Python version is 3.5 or higher.
  • [] Using the latest version of Tweep.
    All good here.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Looking to see if the option --to:username can be added to cover replies if possible given how the tool works.

Description of Issue ## This is a feature request, not so much an issue

IF possible a way to get an entire twitter thread started by a given user OR one a given user has participated in, helps in providing context. I understand if this is not feasible given the way the tool works, but wanted to see if that possibility had been explored. Otherwise this tool is brilliant!

OS Details

Using Windows, Linux? What OS version?

terminal-free usage

Description of Issue

The current usage of tweep requires the use of the CLI, thus in jupyter notebook !python3 tweep.py... is needed for use in other scripts.
It would be great if there is a way of inputting the arg without the need for terminal calls.

live streaming tweets

previos thanks for this tools it's very helpful 😁
how to collect every streaming tweets containing pineapple ?

Scraping Stopping at between 8k and 12k tweets

First of all, thank you for such a great tool!

Description of Issue

I am scraping an entire timeline of a user w/ 150K+ tweets. The output runes fine for 8-12k tweets (usually about 30 days), and then stops. I have seen previous reports of scraping stopping, but I'm not sure if they are related. I have been adding the --until flag and working through approx 30 - 60 days at a time. Is this a bug, user error, or a Twitter limitation? Is there an easy way to batch commands together and run in chucks w/ --since and --until boundaries?

OS Details

Ubuntu / Buscador OSINT

Initial Check

Make sure you've checked the following.

  • [YES] Python version is 3.5 or higher.
  • [YES] Using the latest version of Twint.

Command

python3 twint.py -u username -o timeline.csv --csv

Once it fails, I then add a --until flag:
python3 twint.py -u username --until 2017-06-01 -o timeline.csv --csv

Thank you!

Month Scrape Limit

It seems tweet scraping is limited to a month back from the specified date, even when using the following commands:

python twint.py -u BBC --since 2006-03-21 -o BBC.csv --csv
python twint.py -u BBC --since 2006-03-21 --until 2017-06-01 -o BBC.csv --csv

This might be an issue, or am I overlooking an option?

I am running Twint on Windows 10, with Python 3.6.4 (Anaconda 64 bit).

Since and until together

Hey There,

Thanks for this great code repository!

the following command captures tweets older than 1st December 2017. Is my command wrong?

Command

python3 twint.py -s metoo -l de -o test2.csv --csv --unt 2017-12-02 --since 2017-12-01

Description of Issue

Combination of since and until does not work

OS Details

Mac OS

Cheers,
Felix

Scrape tweets from city

Initial Check

Make sure you've checked the following.

  • Python version is 3.5 or higher.
  • Using the latest version of Tweep.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

python tweep.py -s hello -g -38.416097,-63.616671999999994,100km -o file.csv --csv

Description of Issue

I'd like to scrape tweets from the city of Buenos Aires. When a try it, I get this error:

usage: python3 tweep.py [options]
tweep.py: error: argument -g: expected one argument

How can I solve it?

OS Details

Using Windows, Linux? What OS version?
Windows

Can't collect more than 213947 tweets

Initial Check

Make sure you've checked the following.

  • [y] Python version is 3.5 or higher.
  • [y] Using the latest version of Tweep.

Command

tweep.py -s #metoo --since 2017-10-01 --stats --count -o metoo_data.csv --csv

Description of Issue

I am trying to collect tweets containing #MeToo since 1 October 2017, however the scraper always stops when it gets to about 213947 tweets. This means that the scraper begins at the current date, starts to scrape backwards in time, but it never gets past a few weeks of tweets because it hits the (apparent) limit.

I have tried to exclude 2018, which means that the scraper begins at 31 December 2017, but again it only manages to scrape a few weeks of tweets before it hits the limit.

Any possible solutions much appreciated.

OS Details

Windows 10; completely up to date.

Install tweep

Initial Check

Make sure you've checked the following.

  • [] Python version is 3.5 or higher.
  • [] Using the latest version of Tweep.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Description of Issue

Please use as much detail as possible.

OS Details

Using Windows, Linux? What OS version?

[Windows] Python (Anaconda) UnicodeEncodeError

Hello,
When i run it I get this error.How should i fix it?
Traceback (most recent call last):
File "tweep.py", line 289, in
loop.run_until_complete(main())
File "C:\Users\ION\Anaconda3\lib\asyncio\base_events.py", line 467, in run_until_complete
return future.result()
File "tweep.py", line 237, in main
feed, init, count = await getTweets(init)
File "tweep.py", line 206, in getTweets
print(await outTweet(tweet))
File "tweep.py", line 182, in outTweet
writer.writerow(dat)
File "C:\Users\ION\Anaconda3\lib\encodings\cp1253.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 92-93: character maps to

Replace comma separator in csv format by 'pipe'

Hi,

May I suggest to replace the "," separator in the csv format by "|" ? or simply quote the fields of the csv file.
The comma is difficult to interpret when there are commas in the tweet message itself.

I love this tool. Thanks.

The scraping is stopping

First of all, congratulations to your job, here in the university of goiás, brazil this code are helping us so much.
After you made some modifications and increase other comands i think something is going wrong and i wanna know if you can help me.
For example the string: python3 tweep.py -s "bolsonaro2018" --since 2018-01-01 -o testbolsonaro.csv --csv is stopping on date 2018-04-03, i already tried on windows 10 and ubuntu 17.10 with the actually version of tweep and others ago.
Thanks

Limits >= 20 doesn't work, Tweets option doesn't exist

Initial Check

Make sure you've checked the following.

  • [] Python version is 3.5 or higher. True
  • [] Using the latest version of Tweep. True

Command

python tweep.py -u {account} --limit 20

Description of Issue

For account = "binance":
Only 5 latest tweets get parsed and one tweet from 2017 from person @bi_15174872228
image

For account = "vergecurrency":
If --limit < 20 tweets get parsed infinitely(all of them)
If --limit >= 20 exactly 20 tweets get parsed no matter the --limit value

OS Details

Windows 8.1 64

UPD
Just noticed that tweets in line 180 doesn't exist therefore --tweets doesn't work

[Idea] Tweets that are against the ToS

Got a crazy idea for a new feature to filter out Tweets that are against the ToS. Would be useful for a few reasons:

  • Make suspending a person’s Twitter more easier
  • Make Twitter a safer place

Our SJW users will love it! lol I’ll work on this later this week.

Custom percent-sign output format and/or CSV support

Assuming that the data downloaded from twitter are uniformed "tuples", is it possible to format the text into a CSV format for compatibility and easy formatting?
One of the major problem though is that according to RFC4180, double quotes, commas and newlines all requires escape characters.
If not, it would be nice if people are allowed to use percent-escape form (like the date bash function) to customize to their own needs.
This idea came from https://github.com/jonbakerfish/TweetScraper where they literally have MongoDB as part of their application, such that everything is in a regular and organized manner.

Error while getting Tweets removed for Copyright

The command was :

./tweep.py -u --csv -o tweets.csv

Traceback (most recent call last): File "./tweep.py", line 145, in <module> loop.run_until_complete(main()) File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "./tweep.py", line 115, in main feed, init = await getTweets(init) File "./tweep.py", line 72, in getTweets datestamp = tweet.find("a", "tweet-timestamp")["title"].rpartition(" - ")[-1] TypeError: 'NoneType' object is not subscriptable

multiple search

hi, is possible to do a search with multiple key with 'and' & 'or' operators?

UnicodeEncodeError

I always get this error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 76: ordinal not in range(128)

I'm on the latest Raspbian Stretch

Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l

using

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170124] on linux

locale is:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE=UTF-8
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.