jefferson-henrique / getoldtweets-python Goto Github PK
View Code? Open in Web Editor NEWA project written in Python to get old tweets, it bypass some limitations of Twitter Official API.
License: MIT License
A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.
License: MIT License
Hi, many thanks for this wonderful package!
I am very basic in everything including github, python, tweetscarpping, etc. So, sorry in advance if a dumb question. But, is there a way through got package to set a language criteria? there is no 'setLang' kind of a criteria in the Criteria file.
Many thanks
Shadi
I just searched for #Raees, but it gave only 268 results. Same search with tweepy library gave 1012 results in 10 minutes.
I have installed both lxml and pyquery on windows 10.
Used this command at the command prompt. Python installation directory is created in the path.
python Exporter.py --username 'barackobama' --maxtweets 1
File "Exporter.py", line 8
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'
Kindly help me resolve this issue.
While using the following query to get twitter timeline by username (python Exporter.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 0) the output file includes tweets by a different username. Upon inspecting the webpage on twitter, I believe re-tweets and mentions which appear on one's timeline are also being crawled as belonging to the user. Does the crawler differentiate between the tweets on one's timeline? Thanks
hi, there is another issue.
in some cases, your code only gives empty (not exactly empty, just header) CSV file.
I looked into it, there are 2 cases.
one is a person who hasn't tweeted for a long time. and another is maybe IP-ban cases, I used this code to crawl some organizations twitters, and gives empty CSV files sometimes.
are there any way to avoid this cases? especially the first one - long-time-no-tweet
Hi, I have a vanilla Debian OS and attempting to use your script. Can you share some instructions on the packages I would need to get setup to run the script?
For example: sudo pip install pyquery
Thanks
Example scenario: I follow a user on twitter but their account is locked. Therefore I can see their tweets but cannot use the TweetManager to export them. Do you have any strategies to allow TweetManager to authenticate into my account before getting the JsonResponse?
This is a very useful piece of code. Great work .
I am looking forward to use the code for one of the data mining project at school.
I downloaded the python code and ran the samples. But the Get tweets by query search/ username bound range fails
The same query works in the Java version of this project. I was wondering if I am missing something?
Hi there,
I am no Python pro but would like to use this package. I've cloned the repo and navigated to the folder. I then launched it with
python Export.py --username 'barackobama' --since 2015-09-10 --until 2015-09-12 --maxtweets
I get the following error:
File "Exporter.py", line 6
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'
Can someone show me a simple example of using this?
Thanks
What is the limit to the maximum number of tweets that can be extracted and what is the maximum time duration that can be chosen, I am trying to extract 1000 tweets between 1 month period (3 years old) but I am ending up with only 1 tweet. How to manipulate the code to meet the expected results?
Thanks a lot for releasing this code! It was working great till today when I started encountering ImportError: No module named got
Could you let me know how this can be fixed?
Thanks!
I have been using your code to retrieve tweets dating back a couple of years but just today I was not able to get any tweets older than 10 days. It still works with the web browser but not using your code. I wonder what changed?
first of all, thank you for your code, it helped me a lot!
what I'm trying to do is to get tweet status as JSON format to get mentions on my twitter.
is it possible to bypass Twitter API with your embed twitter querying method for this case?
("https://twitter.com/i/search/timeline?f=realtime&q=%s&src=typd&max_position=%s")
I'm just trying to parse HTML down with regular expressions right now, but it is not working out smoothly.
I ran the given code yesterday and it successfully fetched all the tweets between specific dates for the mentioned tags however i have below concerns, please let me know your comments:
Thanks!
Hello:
i tried the last example today and it does not work anymore.It returned 0 tweets. Everything was fine yesterday so do you know what the problem is?
Thank you
Hello,
Firstly the project is perfect thank you.
I have a problem; I'm using Example 2 but as result l got only one tweet. Should I change something to get all tweets for a keyword and a given time period?
Hi
How can I filter tweets by language?
@Jefferson-Henrique Does this search give Retweets by the user or just the original tweets ?
I found out yesterday that Exporter.py is not giving back actual "all tweets".
for your example, I crawled twitter with username 'barackobama' without other options such as
'maxtweet', it gives the only handful of 25 tweets starting from a week ago.
maybe this code detected and blocked?
Great work!
I had to change the url in TweetManager.py to:
url = "https://twitter.com/i/search/timelinef=tweets&vertical=default&q=%s&src=typd&max_position=%s"
Using the original url, I only got the "Top" results from the search - not the "Live" ones.
Maybe twitter changed the http-address?
I am happy to push the corrected version if you wish?
Again thanks a lot for providing this great program!
I think Twitter may have determined that I had been abusing their service. Yesterday, I ran your program to collect over 20,000 tweets. It ran fine. Each call involved a setSince and setUntil date and setMaxTweets to 10.
Today I changed it setMaxTweets to 20. It worked for the first 10 calls or so and then stopped. I had modified your programming to carry out some pre and post-processing. I resorted back to original copy, latest version, and from what I can see it is not returning anyway tweets which have dates specified in the tweetCriteria. Have you ever encountered it? Any tips? I will try it from a different ip address later
The error is as follows:
Traceback (most recent call last):
File "Main.py", line 32, in
main()
File "Main.py", line 21, in main
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
IndexError: list index out of range
Ie. there are no tweets to return but I am sure that some tweets are. This is happening even on your program with the second query
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees').setSince("2015-05-01").setUntil("2015-09-30").setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
Dear Jefferson
When I install requirments.txt, there are error "Import Error".
What can I do~?!
(venv) C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master>pip install -r requirements.txt
Traceback (most recent call last):
File "C:\Python27\Lib\runpy.py", line 174, in run_module_as_main
"main", fname, loader, pkg_name)
File "C:\Python27\Lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master\venv\Scripts\pip.exe__main.py", line 5, in
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip__init__.py", line 14, in
from pip.utils import get_installed_distributions, get_prog
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\utils__init__.py", line 22, in
from pip.compat import console_to_str, expanduser, stdlib_pkgs
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat__init__.py", line 13, in
from pip.compat.dictconfig import dictConfig as logging_dictConfig
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat\dictconfig.py", line 22, in
import logging.handlers
File "C:\Python27\Lib\logging\handlers.py", line 26, in
import errno, logging, socket, os, cPickle, struct, time, re
File "C:\Python27\Lib\socket.py", line 47, in
import _socket
ImportError: No module named _socket
Im running this code
python Exporter.py --querysearch 'europe refugees' --maxtweets 10
and got this error
Done. Output file generated "output_got.csv".
Traceback (most recent call last):
File "Exporter.py", line 79, in <module>
main(sys.argv[1:])
File "Exporter.py", line 70, in main
got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer)
File "C:\Users\RAHUL_MSI\Anaconda2\lib\site-packages\got\manager\TweetManager.py", line 18, in getTweets
if (tweetCriteria.username.startswith("\'") or tweetCriteria.username.startswith("\"")) and (tweetCriteria.username.
endswith("\'") or tweetCriteria.username.endswith("\"")):
AttributeError: TweetCriteria instance has no attribute 'username'
Hi,
first of all thank you for creating and sharing this module. I'm working on a project that requires specific tweets, and getting them only in the current week would take way to long, so I tried to take advantage of your solution. I'm using Python 3.5, and I unfortunately can't get it to work. I've got the requirements installed and I've also changed all of the print statements to match the new standard.
If I copy the contents of the got3 folder to the main directory and run the exporter, I receive the following error: "attempted relative import beyond top-level package". The line that's causing this problem is the following: "from .. import models" (in TweetManager).
Has anyone been able to use the script with Python 3.5 and would please help me out to get it working? Thanks in advance.
Hello,
I try to generate tweets that include user locations, but tweet.geo returns an empty string. And this is for users with location turned on. Any help?
Hey,
I've been facing these two issues consistently. Any help in fixing/understanding these will be highly appreciated.
Hey there, i'm new to python but i'd like to use this package and learn with experience.
I typed python Exporter.py --username "barackobama" --since 2015-06-01 --until 2016-02-01 --maxtweets 1000\n
but i get this error:
Traceback (most recent call last):
File "Exporter.py", line 79, in
main(sys.argv[1:])
File "Exporter.py", line 75, in main
outputFile.close()
UnboundLocalError: local variable 'outputFile' referenced before assignment
What have i not done / done wrongly? Sorry if this is a beginner's question. Appreciate your help.
Thanks for your job, Jefferson. I really need it right now. but it does not work normally in commend line.
When I input Exporter.py --querysearch '#microsoft' --since 2016-01-01 --until 2016-01-31 --maxtweets 0
I got part of the tweets not all of that, usually less than two days period ending in 2016-01-31. Can you help with it.
On occasion, with a largish number of max_tweets and a search range way in the past, the command line utility appears to get stuck (or maybe twitter is slowing down how quickly it returns pages).
Might it be worth trying to be more defensive on how you break out of the while True
collection loop? Or perhaps appending tweets to the export file as you go along (eg save in batches of every 100 or so)?
Hello,
Would you happen to know if your program returns a lat long (geolocation) of the twitter posts?
So far I have yet to have any strings returned under geo, but perhaps it is due to what I am searching.
When using setUntil()
, only a few dates work, others return this error:
lxml.etree.ParserError: Document is empty
This is my code:
max_tweets = 20
tweetCriteria = got3.manager.TweetCriteria().setUntil("2017-02-02").setQuerySearch("bitcoin").setMaxTweets(max_tweets).setLang(Lang="en")
for i in range(max_tweets):
tweet = got3.manager.TweetManager.getTweets(tweetCriteria)[i]
print(tweet.text)
print(tweet.id)
print(tweet.username)
print(tweet.date)
Here is a little list of dates which work or not:
2017-02-05 error
2017-02-04 error
2017-02-03 ok
2017-02-02 error
2017-02-01 ok
2017-01-31 error
2017-01-30 ok
2017-01-29 ok
2017-01-28 error
2017-01-26 error
2017-01-25 ok
Also when I change the max_tweets
to > 20
I get the same error!
Anybody an idea what happens here? Many thanks in advance.
I would be great to have tweets which contains a particular hashtag rather than limited only to username.
Hello,
I am trying to save the tweets as a csv file. But I receive just Syntaxerror in the command shell.
>>> python Exporter.py -h
SyntaxError: invalid syntax
I use Python 2.7.11
The next question is... I receive just 1 Tweet. I am a beginner with python and I dont Know wich numbers needs to be changed.
Thanks.
For example, retrieving this tweet:
mcantor;2015-09-30 15:25;0;2;""It may take 24 hours to process your unsubscribe request" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792
Not sure if this is an issue with GetOldTweets-python
or the CSV library it uses (does it use a CSV library)? But I believe double-quotes are supposed to be escaped by being repeated. So it should be
mcantor;2015-09-30 15:25;0;2;"""It may take 24 hours to process your unsubscribe request"" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792
Hi, is it possible to use your project under Python 3.5? My python is Python 3.5. and I have issues with some packages you used in your script. So please help me figure it out how to solve it.
It seems that the output ( I used: python Exporter.py --username 'barackobama' --maxtweets 100) returns tweets by other users @BarackObama instead of tweets by barackobama. Is it possible to get tweets by the user using this?
Didn't modify the script at all, was running it on a particular query for a while and now it won't work at all. I tried re-cloning and running from command line and from within another script on Jupyter but all it does is return ''' Searching... \n Done. Output file generated "output_got.csv". ''' for the former and nothing for the latter...
Any ideas as to what happened? I visited Twitter and I don't seem to be blocked or anything.
Hi,
Unfortunately, most of the tweets are missing with query search even though "allTweets=True". I have verified this with several sources (Twitter API, from the Twitter itself from the browser).
Any ideas why?
Hi!
Thanks a lot for creating this amazing project!
When I ran the codes in Main.py, I found that the collected tweets would be repeated if the "maxtweets">30. For instance, 21st - 40 th tweets would be exactly the same as the first 20 tweets, including the time.
Am I missing some important codes so that I got the problem?
Thanks!
Hi Jefferson,
I'm using your code in my Python project which worked well until yesterday.
This morning I faced in a issue reported by other users too.
Basically, when I try to run this query
python Exporter.py --since 2015-02-13 --until 2015-02-19 --querysearch "technology, stock"
output is always an empty csv file. I tried to debug the problem and I have noticed a couple of points:
the first one is that your code works well if the query regards tweets posted in the last 7 days.
Is it possible that there is a Twitter restriction here?
About the second one, I hypothesize the problem is in TweetManager.py line 21; the length of json['items_html'] is equals to 0 and so nothing is saved.
Any ideas about it?
Thank you
Giordano
Running under Py 2.7 throws an ASCII encoding issue on the file write if it hits a character it doesn't like:
Traceback (most recent call last):
File "Exporter.py", line 66, in <module>
main(sys.argv[1:])
File "Exporter.py", line 56, in main
outputFile.write('\n%s;%s;%d;%d;"%s";%s;%s;%s;"%s";%s' % (t.username, t.date.strftime("%Y-%m-%d %H:%M"), t.retweets, t.favorites, t.text, t.geo, t.mentions, t.hashtags, t.id, t.permalink))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position ...
Hello,
I have a difference between result from the url :
https://twitter.com/i/search/timeline?f=tweets&q=%20since%3A2014-01-01%20%23axa%2B%23environment&src=typd&max_position=
And the JSONResponse below:
{"min_position":"TWEET--","has_more_items":false,"items_html":"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n","new_latent_count":0,"focused_refresh_interval":30000}
Do you experience the same issue ?
It seems that it could be due to the Internet Service Provider but I dont understand why ?
Fen
I'm running this in Canopy and I believe I have my search string correct, but I keep getting
ImportError: cannot import name Pseudo
I've tried installing this package manually, but I can't get past this error. I'm fairly new to Python. Is there something I'm doing wrong?
I am trying to get tweets which are addressed to a particular user, say Barack Obama. I tried queryseach and username method, with the parameter as " @BarackObama", but some tweets were missing in the csv file. So, could you tell me the proper way to do this?
command executing
python Exporter.py --querysearch '@RealDonaldTrump' --since 2016-06-11 --until 2016-06-18 --maxtweets 10000000
Twitter weird response. Try to see on browser: https://twitter.com/search?q=%20since%3A2016-06-11%20until%3A2016-06-18%20%27%40realDonaldTrump%27&src=typd
Done. Output file generated "output_got.csv".
Using ';' as separator is a bad decision because lots of tweets have ';' in their text section, and .csv
parsers get confused because of that.
I had scripts running for days gathering tweets (.csv) and now i want to use them with R i have my data.frames bad formatted. I've lost hours of data cleaning and until now i can't even parse this csv files!
Any idea? I can't start gathering the data again because i need it for tomorrow so i'm trying to figure a way to clean this data.
using it with --toptweet returns a lot of not really top one with 0 retweet and 0 like each with a gap of several second with the next row. How can one isolate only n most-toppest-tweet :) for each day though? instead of having to get ~560-630 tweet to find one.
because setting since and until is not enough, setting those parameter to get a monthly data of 100 obs will only return data from the first day only
I'm experiencing the same problem found in Issue #30 but while working with Python3.5 and got3.
May I ask how to use these codes? I have no background in programming, but I'm doing a project on this topic. So after I unzip these files, which one I should use. The "main.py"? Can anyone help me with this?
Hello,
Is it possible to add a way to "log in" with an account and search users' tweets that are protected from unapproved followers?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.