twintproject / twint Goto Github PK

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

License: MIT License

Python 99.82% Dockerfile 0.18%

osint twitter python scrape tweets elasticsearch kibana scrape-followers scrape-likes scrape-following

twint's Introduction

TWINT - Twitter Intelligence Tool

No authentication. No API. No limits.

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.

Twint utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too.

Twint also makes special queries to Twitter allowing you to also scrape a Twitter user's followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.

tl;dr Benefits

Some of the benefits of using Twint vs Twitter API:

Can fetch almost all Tweets (Twitter API limits to last 3200 Tweets only);
Fast initial setup;
Can be used anonymously and without Twitter sign up;
No rate limitations.

Limits imposed by Twitter

Twitter limits scrolls while browsing the user timeline. This means that with .Profile or with .Favorites you will be able to get ~3200 tweets.

Requirements

Python 3.6;
aiohttp;
aiodns;
beautifulsoup4;
cchardet;
dataclasses
elasticsearch;
pysocks;
pandas (>=0.23.0);
aiohttp_socks;
schedule;
geopy;
fake-useragent;
py-googletransx.

Installing

Git:

git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Pip:

pip3 install twint

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Pipenv:

pipenv install git+https://github.com/twintproject/twint.git#egg=twint

March 2, 2021 Update

Added: Dockerfile

Noticed a lot of people are having issues installing (including me). Please use the Dockerfile temporarily while I look into them.

CLI Basic Examples and Combos

A few simple examples to help you understand the basics:

twint -u username - Scrape all the Tweets of a user (doesn't include retweets but includes replies).
twint -u username -s pineapple - Scrape all Tweets from the user's timeline containing pineapple.
twint -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets.
twint -u username --year 2014 - Collect Tweets that were tweeted before 2014.
twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15.
twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.
twint -u username -o file.txt - Scrape Tweets and save to file.txt.
twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file.
twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses.
twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump.
twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
twint -u username -es localhost:9200 - Output Tweets to Elasticsearch
twint -u username -o file.json --json - Scrape Tweets and save as a json file.
twint -u username --database tweets.db - Save Tweets to a SQLite database.
twint -u username --followers - Scrape a Twitter user's followers.
twint -u username --following - Scrape who a Twitter user follows.
twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
twint -u username --following --user-full - Collect full user information a person follows
twint -u username --timeline - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including retweets & replies).
twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile.
twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.

More detail about the commands and options are located in the wiki

Module Example

Twint can now be used as a module and supports custom formatting. More details are located in the wiki

import twint

# Configure
c = twint.Config()
c.Username = "realDonaldTrump"
c.Search = "great"

# Run
twint.run.Search(c)

Output

955511208597184512 2018-01-22 18:43:19 GMT <now> pineapples are the best fruit

import twint

c = twint.Config()

c.Username = "noneprivacy"
c.Custom["tweet"] = ["id"]
c.Custom["user"] = ["bio"]
c.Limit = 10
c.Store_csv = True
c.Output = "none"

twint.run.Search(c)

Storing Options

Write to file;
CSV;
JSON;
SQLite;
Elasticsearch.

Elasticsearch Setup

Details on setting up Elasticsearch with Twint is located in the wiki.

Graph Visualization

Graph details are also located in the wiki.

We are developing a Twint Desktop App.

FAQ

I tried scraping tweets from a user, I know that they exist but I'm not getting them

Twitter can shadow-ban accounts, which means that their tweets will not be available via search. To solve this, pass --profile-full if you are using Twint via CLI or, if are using Twint as module, add config.Profile_full = True. Please note that this process will be quite slow.

More Examples

Followers/Following

To get only follower usernames/following usernames

twint -u username --followers

twint -u username --following

To get user info of followers/following users

twint -u username --followers --user-full

twint -u username --following --user-full

userlist

To get only user info of user

twint -u username --user-full

To get user info of users from a userlist

twint --userlist inputlist --user-full

tweet translation (experimental)

To get 100 english tweets and translate them to italian

twint -u noneprivacy --csv --output none.csv --lang en --translate --translate-dest it --limit 100

import twint

c = twint.Config()
c.Username = "noneprivacy"
c.Limit = 100
c.Store_csv = True
c.Output = "none.csv"
c.Lang = "en"
c.Translate = True
c.TranslateDest = "it"
twint.run.Search(c)

Notes:

Google translate has some quotas

Featured Blog Posts:

Contact

If you have any question, want to join in discussions, or need extra help, you are welcome to join our Twint focused channel at OSINT team

twint's People

Contributors

Stargazers

Watchers

Forkers

ukscone threatinteltest songofhack chubbymaggie hyabcd bahaahassanieh y0d4a rich-roth techlord-rce ashr acarist fiuderazes priya-gittest olivierh59500 grukz biek michalszalkowski pasterp h0k5 pythonone tobey123 sjas exploitcollection hexwaxwing m41doror gaborod16 securitywarrior nynhex inspectordidi asparagirl chaves ro9ueadmin may215 vsdaking92 hpiedcoq hexadiam masimovr priestd09 tai-euler kunbudiharta jst51 eaterofhope grevutiu-gabriel prabod akhzarfarhan 4sakura naocaio assem-ch charlie447 frank113 gdelavald gsgoncalves barkays primemover2011 nestor75 yonilevine sunilkumar95 jbarcia mehrdad-shokri opt9 p3t3rp4rk3r fuckup1337 lezusxj45 firefalc0n losiry pentesting ajdavis 4n6strider mzubairsaleem systemplusutp hongtaicao drat irzaip weeshlow marakujaeu yeahbytes charliefourindia kodoyosa weirayao dutchosintguy glira perna chanme1 maksimuc24 avk3 chickenlove jorellano aekeys jemather zmellal thomasernste coda-ae attacker34 aldou dummydoo i9nis jackburrus ppvastar boogheta malwar3ninja

twint's Issues

Some questions regarding CSV

Question:

Are pipe character | properly escaped when dumping output to CSV (for compatibility)?
Why are Emojis represented as <Emoji: blah> instead of proper Unicode?
Can some of the link abbreviations be unpacked to the original form, or put on a seperate column?
- https://github.com/haccer... should show the whole link in a separate column
- https://t.co/ur1c0de5 should be replaced by the redirected link or have it on a separate column.

Doubt about how i can stop the script without lose the content

This is not an issue specifically, but a doubt. How i can stop the script after a time, without lose the content already scraped?
Again, thanks for all dedication writing this code!

Append to csv/file on update

Is there a way to instead of running the script and it downloading and saving output to a file each time, we can just append the latest tweets into the file? Some way of comparing what we have versus what we're missing?

Thanks

Extract user tweets by userID instead of name/handle

If I have a list of twitter users I have to scrape, but the list is in ID instead of handles, is it possible to use a converter like https://tweeterid.com/ to do the conversion?
Twitter has an official solution, that is scraping a redirect from https://twitter.com/intent/user?user_id=${number}, however there is no shortcut for the search bar to do it.

Problem with -o --cvs in the new Twint version released today

Initial Check

Make sure you've checked the following.

[] Python version is 3.5 or higher.
[] Using the latest version of Twint.

If this is a feature request please specify in the title:

Example:

[REQUEST] More features!

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Description of Issue

Please use as much detail as possible.

OS Details

Using Windows, Linux? What OS version?

problem with -o parameter in the new version released today does not work

Initial Check

Make sure you've checked the following.

Python version is 3.5 or higher.
Using the latest version of Twint.

Command

python twint.py -u nestorpomar --output file.csv --csv
python twint.py -u nestorpomar -o file.csv --csv

Description of Issue

with the previous TWINT version I don't have any problem in writting CSV files
I just installed the new version released today and when I use -o or --output , the execution gives the following error message:

Traceback (most recent call last):
File "twint.py", line 104, in
main()
File "twint.py", line 64, in main
if args.csv and args.o is None:
AttributeError: 'Namespace' object has no attribute 'o'

If I try with the previous version it works without probleam

OS Details

ubuntu 16

Does not work with some twitter accounts

Twitter accounts with This profile may include potentially sensitive contenton the front,
where you have to click into it to access the content, will break tweep.
It will have a button that look like
<button class="EdgeButton EdgeButton--tertiary ProfileWarningTimeline-button">Yes, view profile</button>

Tweep stop scraping after about 15000/20000 tweets back

Command

python3 tweep.py -s eurusd

Description of Issue

I want to ask why if I start scraping for a keyword after about 10 days back the scraping process ends.
I would like to scrape at least 1000 days back
The number of scraped tweets change each time I use "python3 tweep.py -s eurusd"

OS Details

Debian 9

[Error] Failing before limit

I've run Twint successfully a lot with no issues (thanks! it's great!), but starting last night, a number of of attempts have run into problems. I'm wondering what you make of these errors. I do have Python 3.6.5 and the latest version of Twint. I'm running on Mac OS High Sierra 10.13.4.

python3 twint.py -u NickKristof -o NickKristof_m_lib_twint.csv --csv
(runs for a few hundred tweets and then...)

Traceback (most recent call last):
File "twint.py", line 193, in fetch
async with session.get(url) as response:
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 783, in aenter
self._resp = await self._coro
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 333, in _request
await resp.start(conn, read_until_eof)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 695, in start
(message, payload) = await self._protocol.read()
File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 533, in read
await self._waiter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "twint.py", line 771, in
loop.run_until_complete(main())
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
return future.result()
File "twint.py", line 689, in main
feed, init, count = await getTweets(init)
File "twint.py", line 510, in getTweets
tweets, init = await getFeed(init)
File "twint.py", line 291, in getFeed
response = await fetch(session, await getUrl(init))
File "twint.py", line 194, in fetch
return await response.text()
File "/usr/local/lib/python3.6/site-packages/async_timeout/init.py", line 38, in exit
self._do_exit(exc_type)
File "/usr/local/lib/python3.6/site-packages/async_timeout/init.py", line 83, in _do_exit
raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

Doesn't scrap tweets till the specified date

Initial Check

Make sure you've checked the following.

Python version is 3.5 or higher.
Using the latest version of Tweep.

Command

python tweep.py -s Bitcoin --since 2018-01-01 -0 file.csv --csv

Description of Issue

The module stops scraping tweets beyond a certain time frame, whereas it is expected to scrap all posts since 1st of January as mentioned in the command line args.
Example: If the latest tweet was on 2018-03-16 21:30:57 IST, the module retrieves tweets till about 2018-03-16 06:54:15 IST.

OS Details

OS Name: Microsoft Windows 10 Home Single Language
OS Version: 10.0.16299 N/A Build 16299

getitem error

Previously I was able to use tweet. Now not anymore.
i get this error:

TypeError: 'NoneType' object has no attribute 'getitem'

what next?

lorenzo

Why this gap between tweet posted and tweets downloaded. [no bug]

I don't think this is a bug, but I'm just wondering why is there such a gap between the number of tweets I'm supposed to have posted (+/- 1700) and the tweets I really download ( (+/- 770).
Using the --count option was really disapointing on this point of view. Any idea about that?

friends and followers

Hi,

do you think the code could be easily extended to mine also the followers and friends of a (list of) user?

Thanks

Scraping a user's followers

I would like to scrape a user's followers to create a network diagram of users.

Ability to stop after extracting N tweets

Suppose one twitter user posts a lot, is it possible to stop the loop after N latest tweets (with other parameters) are being downloaded?

Requirements List is Incomplete

In order to get tweep.py to run, I also had to install image and requests.

Is it possible to take screenshots of found tweets?

Request:

Is it possible to take screenshots of found tweets? Instead of saving them into a csv file?

TypeError: 'NoneType'

I've been testing the script with different usernames and languages and it's showing an error in line 78

It retrieves four tweets and then it breaks showing the following error:

Traceback (most recent call last):
File "tweep.py", line 168, in
tweep().main()
File "tweep.py", line 130, in main
self.get_tweets()
File "tweep.py", line 78, in get_tweets
datestamp = tweet.find('a','tweet-timestamp')['title'].rpartition(' - ')[-1]

TypeError: ' NoneType' object has no attribute 'getitem'

Output request: comment count, retweet count, favorite count, image permanent url

You're the man, Cody 🙏🏼

[REQUEST] new fields to CSV, JSON and DB. Header in CSV - mod - modified coded included

Initial Check

Make sure you've checked the following.

Python version is 3.5 or higher.
Using the latest version of Twint. (I realized you made some changes in the latest hours so it's not the latest but yesterday version)

If this is a feature request please specify in the title:

Example:

[REQUEST]
I want to get and store some fields I think your code doesn't do it out of the box.
The fields are:

user_id
mentions
permanent link

a part from that I want to include a header in the CSV output fields with what is it each colum but only if the file does not exist.

I want to share with you the modificatios I did just in case you consider they could be interesting for you and the rest of the people.

I checked for CSV files and JSON but not for storing the date in a database
in bold the file and what I modified / included

file: db.py (it confuse my as the code is commented....)

import os

table_tweets = """
CREATE TABLE IF NOT EXISTS
tweets (
id integer primary key,
date text not null,
time text not null,
timezone text not null,
user text not null,
tweet text not null,
replies integer,
likes integer,
retweets integer,
hashtags text,
**mentions text
userid integer,
permanentlink text

			);**

“””

def tweets(conn, Tweet):
try:
cursor = conn.cursor()
entry = (Tweet.id,
Tweet.datestamp,
Tweet.timestamp,
Tweet.timezone,
Tweet.username,
Tweet.tweet,
Tweet.replies,
Tweet.likes,
Tweet.retweets,
",".join(Tweet.hashtags),
",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink,)
cursor.execute("INSERT INTO tweets VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?)", entry)

conn.commit()
except sqlite3.IntegrityError:
	pass

====================
file tweet.py

class Tweet(object):
id = ""
date = ""
datestamp = ""
time = ""
timestamp = ""
timezone = ""
username = ""
tweet = "" # Text
replies = "0"
likes = "0"
retweets = "0"
hashtags = ""
mentions = ""
userid = ""
permanentlink = ""

==================
output.py

def writeJSON(Tweet, file):
data = {
"id": Tweet.id,
"date": Tweet.datestamp,
"time": Tweet.timestamp,
"timezone": Tweet.timezone,
"username": Tweet.username,
"tweet": Tweet.tweet,
"replies": Tweet.replies,
"retweets": Tweet.retweets,
"likes": Tweet.likes,
"hashtags": ",".join(Tweet.hashtags),
",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink]
with open(file, "a", newline='', encoding="utf-8") as json_file:
json.dump(data, json_file)
json_file.write("\n")

def writeCSV(Tweet, file):
data = [
Tweet.id,
Tweet.datestamp,
Tweet.timestamp,
Tweet.timezone,
Tweet.username,
Tweet.tweet,
Tweet.replies,
Tweet.retweets,
Tweet.likes,
",".join(Tweet.hashtags),
**",".join(Tweet.mentions),
Tweet.userid,
Tweet.permanentlink]
if not (os.path.exists(file)):
with open(file, "a", newline='', encoding="utf-8") as csv_file:
writer = csv.writer(csv_file, delimiter="\t")
writer.writerow(["tweetid", "date", "time", "timezone", "user","text",
"replies","retweets","likes", "hashtags",
"mentions", "user_id", "permanent Link"])
writer.writerow(data)

        else:
                   with open(file, "a", newline='', encoding="utf-8") as csv_file:
                        writer = csv.writer(csv_file, delimiter="\t")
                        writer.writerow(data)**

def getMentions_nes(text):
mentions = re.findall(r'(?i)@\w+', text, flags=re.UNICODE)
return mentions
#return ",".join(mentions)

Sort HTML

def getTweet(tw, config):
t = Tweet()
t.id = tw.find("div")["data-item-id"]
t.date = getDate(tw)
if config.Since and config.Until:
if (t.date.date() - datetime.datetime.strptime(config.Since, "%Y-%m-%d").date()).days == -1:
# mitigation here, maybe find something better
sys.exit(0)
t.datestamp = t.date.strftime("%Y-%m-%d")
t.time = getTime(tw)
t.timestamp = t.time.strftime("%H:%M:%S")
t.username = tw.find("span", "username").text.replace("@", "")
t.timezone = strftime("%Z", gmtime())
for img in tw.findAll("img", "Emoji Emoji--forText"):
img.replaceWith("<{}>".format(img['aria-label']))
t.tweet = getMentions(tw, getText(tw))
t.hashtags = getHashtags(t.tweet)
t.mentions = getMentions_nes(t.tweet)
t.replies = getStat(tw, "reply")
t.retweets = getStat(tw, "retweet")
t.likes = getStat(tw, "favorite")
t.userid = tw.find("div")["data-user-id"]
t.permanentlink = 'https://twitter.com' + tw.find("div")["data-permalink-path"]
return t

===================

OS Details

Using Windows, Linux? Linux
What OS version? Ubuntu 16

thank you very much for your help

Not getting data from a username

I ran twint against a username yesterday and it worked great. Today I'm trying to do the same and the command is just sitting there not doing anything. If I choose another username it seems to work. I'm not seeing anywhere where it will show a log file to maybe try and debug the issue or find out why it isn't working now.

username is @rusthackreport

Cheers

Rewriting this as a python library or function

Possibly removing the arg object from tweep.__init__,
and also if statements from get_url and get_tweets.
That would make the class cleaner and easier to port.

There are certain accounts that are unscrapable

I tried https://twitter.com/kimdotcom to test the installation, and it certainly works.
However, https://twitter.com/a_fellow_white does not (flagged as"sensitive content")
When I try to search his tweets, unless I am logged in, none of his tweets will show up.

UserID as data column

For ease of retrieval, it would be great if UserID is used for getting tweets (instead of only username).
Same can be said for TweetID, reply/quote TweetID etc.

Deleted/Protected user account detection

A way to skip deleted and protected user, and outputs them as a list.

entry for table 'users' shows none for multiuser entry

when using python3 tweep.py -s 'truedonaldtrump' --database test.db the saved table 'users' in SQLight, contains the entry 'None' instead of the Usernames. So multiuser entries won't work.
If you use python3 tweep.py -u 'truedonaldtrump' --database test.db the entry of 'users' is right.
Bug or feature?

Tweep vs Tweepy

What are some of the disadvantages of using Tweep compaired to APIs that use auth such as Tweepy? The README only lists advantages.

e.g. Will Tweep miss tweets that the Twitter API wouldn't?

Language selection

Description of Issue

Allowing the selection of languages (e.g. English, French, German) as a parameter.
MasimovR@d8932d7

Event loop is running exception

Greetings! Would be glad to receive some clarifications

Command

import twint

c = twint.Config()
c.Search = 'sometext'
c.Since = '2018-04-26'
c.Until = '2018-04-26'
result = twint.Search(c)

Description of Issue

Ran this code. Exception happening:

~/Analysis/toolbox/twint/twint/search.py in __init__(self, config)
     35 
     36                 loop = asyncio.get_event_loop()
---> 37                 loop.run_until_complete(self.main())
     38 
     39         async def Feed(self):

/usr/lib/python3.5/asyncio/base_events.py in run_until_complete(self, future)
    373         future.add_done_callback(_run_until_complete_cb)
    374         try:
--> 375             self.run_forever()
    376         except:
    377             if new_task and future.done() and not future.cancelled():

/usr/lib/python3.5/asyncio/base_events.py in run_forever(self)
    338         self._check_closed()
    339         if self.is_running():
--> 340             raise RuntimeError('Event loop is running.')
    341         self._set_coroutine_wrapper(self._debug)
    342         self._thread_id = threading.get_ident()

RuntimeError: Event loop is running.

Afterwards jupyter kernel just dies

OS Details

Ubuntu 16.04, jupyter notebook

Possibility of downloading threads w/replies?

Initial Check

Make sure you've checked the following.

[] Python version is 3.5 or higher.
[] Using the latest version of Tweep.
All good here.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Looking to see if the option --to:username can be added to cover replies if possible given how the tool works.

Description of Issue ## This is a feature request, not so much an issue

IF possible a way to get an entire twitter thread started by a given user OR one a given user has participated in, helps in providing context. I understand if this is not feasible given the way the tool works, but wanted to see if that possibility had been explored. Otherwise this tool is brilliant!

OS Details

Using Windows, Linux? What OS version?

terminal-free usage

Description of Issue

The current usage of tweep requires the use of the CLI, thus in jupyter notebook !python3 tweep.py... is needed for use in other scripts.
It would be great if there is a way of inputting the arg without the need for terminal calls.

live streaming tweets

previos thanks for this tools it's very helpful 😁
how to collect every streaming tweets containing pineapple ?

Scraping Stopping at between 8k and 12k tweets

First of all, thank you for such a great tool!

Description of Issue

I am scraping an entire timeline of a user w/ 150K+ tweets. The output runes fine for 8-12k tweets (usually about 30 days), and then stops. I have seen previous reports of scraping stopping, but I'm not sure if they are related. I have been adding the --until flag and working through approx 30 - 60 days at a time. Is this a bug, user error, or a Twitter limitation? Is there an easy way to batch commands together and run in chucks w/ --since and --until boundaries?

OS Details

Ubuntu / Buscador OSINT

Initial Check

Make sure you've checked the following.

[YES] Python version is 3.5 or higher.
[YES] Using the latest version of Twint.

Command

python3 twint.py -u username -o timeline.csv --csv

Once it fails, I then add a --until flag:
python3 twint.py -u username --until 2017-06-01 -o timeline.csv --csv

Thank you!

Month Scrape Limit

It seems tweet scraping is limited to a month back from the specified date, even when using the following commands:

python twint.py -u BBC --since 2006-03-21 -o BBC.csv --csv
python twint.py -u BBC --since 2006-03-21 --until 2017-06-01 -o BBC.csv --csv

This might be an issue, or am I overlooking an option?

I am running Twint on Windows 10, with Python 3.6.4 (Anaconda 64 bit).

Since and until together

Hey There,

Thanks for this great code repository!

the following command captures tweets older than 1st December 2017. Is my command wrong?

Command

python3 twint.py -s metoo -l de -o test2.csv --csv --unt 2017-12-02 --since 2017-12-01

Description of Issue

Combination of since and until does not work

OS Details

Mac OS

Cheers,
Felix

Scrape tweets from city

Initial Check

Make sure you've checked the following.

Python version is 3.5 or higher.
Using the latest version of Tweep.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

python tweep.py -s hello -g -38.416097,-63.616671999999994,100km -o file.csv --csv

Description of Issue

I'd like to scrape tweets from the city of Buenos Aires. When a try it, I get this error:

usage: python3 tweep.py [options]
tweep.py: error: argument -g: expected one argument

How can I solve it?

OS Details

Using Windows, Linux? What OS version?
Windows

Can't collect more than 213947 tweets

Initial Check

Make sure you've checked the following.

[y] Python version is 3.5 or higher.
[y] Using the latest version of Tweep.

Command

tweep.py -s #metoo --since 2017-10-01 --stats --count -o metoo_data.csv --csv

Description of Issue

I am trying to collect tweets containing #MeToo since 1 October 2017, however the scraper always stops when it gets to about 213947 tweets. This means that the scraper begins at the current date, starts to scrape backwards in time, but it never gets past a few weeks of tweets because it hits the (apparent) limit.

I have tried to exclude 2018, which means that the scraper begins at 31 December 2017, but again it only manages to scrape a few weeks of tweets before it hits the limit.

Any possible solutions much appreciated.

OS Details

Windows 10; completely up to date.

Install tweep

Initial Check

Make sure you've checked the following.

[] Python version is 3.5 or higher.
[] Using the latest version of Tweep.

Command

Please provide the exact command ran including the username/search so I may reproduce the issue.

Description of Issue

Please use as much detail as possible.

OS Details

Using Windows, Linux? What OS version?

[Windows] Python (Anaconda) UnicodeEncodeError

Hello,
When i run it I get this error.How should i fix it?
Traceback (most recent call last):
File "tweep.py", line 289, in
loop.run_until_complete(main())
File "C:\Users\ION\Anaconda3\lib\asyncio\base_events.py", line 467, in run_until_complete
return future.result()
File "tweep.py", line 237, in main
feed, init, count = await getTweets(init)
File "tweep.py", line 206, in getTweets
print(await outTweet(tweet))
File "tweep.py", line 182, in outTweet
writer.writerow(dat)
File "C:\Users\ION\Anaconda3\lib\encodings\cp1253.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 92-93: character maps to

Direct access using Python

It looks like that some other use made a fork that allow such a feature, might be a good thing to have a look into this.
https://github.com/cbjrobertson/pytweep

Replace comma separator in csv format by 'pipe'

Hi,

May I suggest to replace the "," separator in the csv format by "|" ? or simply quote the fields of the csv file.
The comma is difficult to interpret when there are commas in the tweet message itself.

I love this tool. Thanks.

The scraping is stopping

First of all, congratulations to your job, here in the university of goiás, brazil this code are helping us so much.
After you made some modifications and increase other comands i think something is going wrong and i wanna know if you can help me.
For example the string: python3 tweep.py -s "bolsonaro2018" --since 2018-01-01 -o testbolsonaro.csv --csv is stopping on date 2018-04-03, i already tried on windows 10 and ubuntu 17.10 with the actually version of tweep and others ago.
Thanks

Limits >= 20 doesn't work, Tweets option doesn't exist

Initial Check

Make sure you've checked the following.

[] Python version is 3.5 or higher. True
[] Using the latest version of Tweep. True

Command

python tweep.py -u {account} --limit 20

Description of Issue

For account = "binance":
Only 5 latest tweets get parsed and one tweet from 2017 from person @bi_15174872228

For account = "vergecurrency":
If --limit < 20 tweets get parsed infinitely(all of them)
If --limit >= 20 exactly 20 tweets get parsed no matter the --limit value

OS Details

Windows 8.1 64

UPD
Just noticed that tweets in line 180 doesn't exist therefore --tweets doesn't work

[Idea] Tweets that are against the ToS

Got a crazy idea for a new feature to filter out Tweets that are against the ToS. Would be useful for a few reasons:

Make suspending a person’s Twitter more easier
Make Twitter a safer place

Our SJW users will love it! lol I’ll work on this later this week.

Custom percent-sign output format and/or CSV support

Assuming that the data downloaded from twitter are uniformed "tuples", is it possible to format the text into a CSV format for compatibility and easy formatting?
One of the major problem though is that according to RFC4180, double quotes, commas and newlines all requires escape characters.
If not, it would be nice if people are allowed to use percent-escape form (like the date bash function) to customize to their own needs.
This idea came from https://github.com/jonbakerfish/TweetScraper where they literally have MongoDB as part of their application, such that everything is in a regular and organized manner.

Error while getting Tweets removed for Copyright

The command was :

./tweep.py -u --csv -o tweets.csv

Traceback (most recent call last): File "./tweep.py", line 145, in <module> loop.run_until_complete(main()) File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "./tweep.py", line 115, in main feed, init = await getTweets(init) File "./tweep.py", line 72, in getTweets datestamp = tweet.find("a", "tweet-timestamp")["title"].rpartition(" - ")[-1] TypeError: 'NoneType' object is not subscriptable

Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l

using

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170124] on linux

locale is:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_GB.UTF-8
LANGUAGE=
LC_CTYPE=UTF-8
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

Follows and followers extraction (with the twitter login feature)

all we need to do is to get a list of "data-user-id" in the "ProfileCard" class,
after finding a way to scroll to the bottom to capture all follows/followers.

Related issue regarding logins: #14

twintproject / twint Goto Github PK

twint's Introduction

TWINT - Twitter Intelligence Tool

tl;dr Benefits

Limits imposed by Twitter

Requirements

Installing

March 2, 2021 Update

CLI Basic Examples and Combos

Module Example

Storing Options

Elasticsearch Setup

Graph Visualization

FAQ

More Examples

Followers/Following

userlist

tweet translation (experimental)

Featured Blog Posts:

Contact

twint's People

Contributors

Stargazers

Watchers

Forkers

twint's Issues

Initial Check

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Initial Check

I checked for CSV files and JSON but not for storing the date in a database in bold the file and what I modified / included

Sort HTML

OS Details

Description of Issue

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue ## This is a feature request, not so much an issue

OS Details

Description of Issue

Description of Issue

OS Details

Initial Check

Command

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Initial Check

Command

Description of Issue

OS Details

Recommend Projects

Recommend Topics

Recommend Org

I checked for CSV files and JSON but not for storing the date in a database
in bold the file and what I modified / included