jefferson-henrique / getoldtweets-python Goto Github PK

View Code? Open in Web Editor NEW

1.3K 1.3K 811.0 46 KB

A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.

License: MIT License

Python 100.00%

getoldtweets-python's People

Contributors

Stargazers

Watchers

Forkers

tianxiang skha2104 sensaid belvo swatisaoji1 aashanand apprenticegirl chjose bflashcp3f drat drizham saviaga anuragreddygv323 sampathpulukurthi rolandcolored andybiar argysamo bubavv snulaka hv8 ankit96 jrg96 akkiverma22 bensolomon njharbo naushadzaman ddandur jewelryland kenzoakira yiningg albert-ho amsolis latif0516 pic85 mathatter yding9527 sonalranjit gvteja wienen tommuhm nweat bsdpunk brityboy schollz richard-rxj jeewonlee shahsahil zer0-64 bread-tan cheskayang scyrusk alesbou chenyue-david msdogan lrenz nsang0u leoneljunior justinlai peterssp7932 dorfman rfv ciyuanzhang barraqda singhpratyush ericinlinux yenhao arynas samoh cheulkaravi errakeshpd nitin-panwar upendra-k14 saadatqadri adidoit delwar2016 shantanuj agile-innovations deebyadeepparida risher1114 fragrusti piggyhuang2 dylanhatlas segerberg seominlee atlasquest elliebirbeck magnify2017 ang3er emanuelegit melvynator mgglenn leereak phaerus mohamadhussien zingbretsenucb budiryan indralukmana j1wan shdut justmaxfield

getoldtweets-python's Issues

language criteria

Hi, many thanks for this wonderful package!
I am very basic in everything including github, python, tweetscarpping, etc. So, sorry in advance if a dumb question. But, is there a way through got package to set a language criteria? there is no 'setLang' kind of a criteria in the Criteria file.

Many thanks
Shadi

Showing very less number of tweets

I just searched for #Raees, but it gave only 268 results. Same search with tweepy library gave 1012 results in 10 minutes.

No idea what the error is about :(

I have installed both lxml and pyquery on windows 10.
Used this command at the command prompt. Python installation directory is created in the path.

python Exporter.py --username 'barackobama' --maxtweets 1
File "Exporter.py", line 8
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'

Kindly help me resolve this issue.

Getting Usertimeline

While using the following query to get twitter timeline by username (python Exporter.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 0) the output file includes tweets by a different username. Upon inspecting the webpage on twitter, I believe re-tweets and mentions which appear on one's timeline are also being crawled as belonging to the user. Does the crawler differentiate between the tweets on one's timeline? Thanks

gives empty csv file

hi, there is another issue.

in some cases, your code only gives empty (not exactly empty, just header) CSV file.

I looked into it, there are 2 cases.

one is a person who hasn't tweeted for a long time. and another is maybe IP-ban cases, I used this code to crawl some organizations twitters, and gives empty CSV files sometimes.

are there any way to avoid this cases? especially the first one - long-time-no-tweet

Install

Hi, I have a vanilla Debian OS and attempting to use your script. Can you share some instructions on the packages I would need to get setup to run the script?

For example: sudo pip install pyquery

Thanks

Getting tweets for locked accounts that you follow

Example scenario: I follow a user on twitter but their account is locked. Therefore I can see their tweets but cannot use the TweetManager to export them. Do you have any strategies to allow TweetManager to authenticate into my account before getting the JsonResponse?

Get tweets by query search bound range fails

This is a very useful piece of code. Great work .
I am looking forward to use the code for one of the data mining project at school.

I downloaded the python code and ran the samples. But the Get tweets by query search/ username bound range fails

The same query works in the Java version of this project. I was wondering if I am missing something?

Basic help to use this - like a dummies example

Hi there,
I am no Python pro but would like to use this package. I've cloned the repo and navigated to the folder. I then launched it with

python Export.py --username 'barackobama' --since 2015-09-10 --until 2015-09-12 --maxtweets

I get the following error:
File "Exporter.py", line 6
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'

Can someone show me a simple example of using this?
Thanks

Tweets extraction limit?

What is the limit to the maximum number of tweets that can be extracted and what is the maximum time duration that can be chosen, I am trying to extract 1000 tweets between 1 month period (3 years old) but I am ending up with only 1 tweet. How to manipulate the code to meet the expected results?

No module named got

Hi @Jefferson-Henrique

Thanks a lot for releasing this code! It was working great till today when I started encountering ImportError: No module named got

Could you let me know how this can be fixed?

Thanks!

Has Twitter just disabled retrieval of tweets older than 10 days?

I have been using your code to retrieve tweets dating back a couple of years but just today I was not able to get any tweets older than 10 days. It still works with the web browser but not using your code. I wonder what changed?

getting JSON from Tweet Status is possible?

first of all, thank you for your code, it helped me a lot!

what I'm trying to do is to get tweet status as JSON format to get mentions on my twitter.

is it possible to bypass Twitter API with your embed twitter querying method for this case?
("https://twitter.com/i/search/timeline?f=realtime&q=%s&src=typd&max_position=%s")

I'm just trying to parse HTML down with regular expressions right now, but it is not working out smoothly.

Not able to fetch tweets

I ran the given code yesterday and it successfully fetched all the tweets between specific dates for the mentioned tags however i have below concerns, please let me know your comments:

Are these the entire tweets for the specified interval( or is there any limit for a calendar day?)
When I am running the same script again with updated tags, I am not getting even a single tweet, please let me know the reason for this?

Thanks!

bound dates suddenly not working

Hello:

i tried the last example today and it does not work anymore.It returned 0 tweets. Everything was fine yesterday so do you know what the problem is?

Thank you

Get tweets by query search

Hello,
Firstly the project is perfect thank you.
I have a problem; I'm using Example 2 but as result l got only one tweet. Should I change something to get all tweets for a keyword and a given time period?

Filter tweets by language

How can I filter tweets by language?

do we get retweets as well ?

@Jefferson-Henrique Does this search give Retweets by the user or just the original tweets ?

only returning tweets for last week?

I found out yesterday that Exporter.py is not giving back actual "all tweets".

for your example, I crawled twitter with username 'barackobama' without other options such as
'maxtweet', it gives the only handful of 25 tweets starting from a week ago.

maybe this code detected and blocked?

Only fetching Top results

Great work!

I had to change the url in TweetManager.py to:
url = "https://twitter.com/i/search/timelinef=tweets&vertical=default&q=%s&src=typd&max_position=%s"

Using the original url, I only got the "Top" results from the search - not the "Live" ones.

Maybe twitter changed the http-address?

I am happy to push the corrected version if you wish?

Again thanks a lot for providing this great program!

setSince and setUntil issue

I think Twitter may have determined that I had been abusing their service. Yesterday, I ran your program to collect over 20,000 tweets. It ran fine. Each call involved a setSince and setUntil date and setMaxTweets to 10.

Today I changed it setMaxTweets to 20. It worked for the first 10 calls or so and then stopped. I had modified your programming to carry out some pre and post-processing. I resorted back to original copy, latest version, and from what I can see it is not returning anyway tweets which have dates specified in the tweetCriteria. Have you ever encountered it? Any tips? I will try it from a different ip address later

The error is as follows:
Traceback (most recent call last):
File "Main.py", line 32, in
main()
File "Main.py", line 21, in main
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
IndexError: list index out of range

Ie. there are no tweets to return but I am sure that some tweets are. This is happening even on your program with the second query

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees').setSince("2015-05-01").setUntil("2015-09-30").setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]

ImportError: No module named _socket

Dear Jefferson

When I install requirments.txt, there are error "Import Error".

What can I do~?!

(venv) C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master>pip install -r requirements.txt
Traceback (most recent call last):
File "C:\Python27\Lib\runpy.py", line 174, in run_module_as_main
"main", fname, loader, pkg_name)
File "C:\Python27\Lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master\venv\Scripts\pip.exe__main.py", line 5, in
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip__init__.py", line 14, in
from pip.utils import get_installed_distributions, get_prog
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\utils__init__.py", line 22, in
from pip.compat import console_to_str, expanduser, stdlib_pkgs
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat__init__.py", line 13, in
from pip.compat.dictconfig import dictConfig as logging_dictConfig
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat\dictconfig.py", line 22, in
import logging.handlers
File "C:\Python27\Lib\logging\handlers.py", line 26, in
import errno, logging, socket, os, cPickle, struct, time, re
File "C:\Python27\Lib\socket.py", line 47, in
import _socket
ImportError: No module named _socket

querysearch not working

Im running this code
python Exporter.py --querysearch 'europe refugees' --maxtweets 10

and got this error

Done. Output file generated "output_got.csv".
Traceback (most recent call last):
  File "Exporter.py", line 79, in <module>
    main(sys.argv[1:])
  File "Exporter.py", line 70, in main
    got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer)
  File "C:\Users\RAHUL_MSI\Anaconda2\lib\site-packages\got\manager\TweetManager.py", line 18, in getTweets
    if (tweetCriteria.username.startswith("\'") or tweetCriteria.username.startswith("\"")) and (tweetCriteria.username.
endswith("\'") or tweetCriteria.username.endswith("\"")):
AttributeError: TweetCriteria instance has no attribute 'username'

Working with Python 3.5?

Hi,

first of all thank you for creating and sharing this module. I'm working on a project that requires specific tweets, and getting them only in the current week would take way to long, so I tried to take advantage of your solution. I'm using Python 3.5, and I unfortunately can't get it to work. I've got the requirements installed and I've also changed all of the print statements to match the new standard.

If I copy the contents of the got3 folder to the main directory and run the exporter, I receive the following error: "attempted relative import beyond top-level package". The line that's causing this problem is the following: "from .. import models" (in TweetManager).

Has anyone been able to use the script with Python 3.5 and would please help me out to get it working? Thanks in advance.

Location info

Hello,

I try to generate tweets that include user locations, but tweet.geo returns an empty string. And this is for users with location turned on. Any help?

Issue with fetching user timeline for longer periods

Hey,
I've been facing these two issues consistently. Any help in fixing/understanding these will be highly appreciated.

Tweets that appear from query search results are missing in the corresponding user timelines. This happens even after setting the topTweets to "False".
This issue has previously been posted. Getting user timeline for a longer time period, say 2-3 years results in an empty .csv file. Sometimes, it runs for a couple of months and terminates. Is this an ISP issue? I've tried reducing the time period to a month, a day but it still fails.

another dummies question

Hey there, i'm new to python but i'd like to use this package and learn with experience.

I typed python Exporter.py --username "barackobama" --since 2015-06-01 --until 2016-02-01 --maxtweets 1000\n

but i get this error:

Traceback (most recent call last):
File "Exporter.py", line 79, in
main(sys.argv[1:])
File "Exporter.py", line 75, in main
outputFile.close()
UnboundLocalError: local variable 'outputFile' referenced before assignment

What have i not done / done wrongly? Sorry if this is a beginner's question. Appreciate your help.

querysearch problem

Thanks for your job, Jefferson. I really need it right now. but it does not work normally in commend line.

When I input Exporter.py --querysearch '#microsoft' --since 2016-01-01 --until 2016-01-31 --maxtweets 0

I got part of the tweets not all of that, usually less than two days period ending in 2016-01-31. Can you help with it.

From command line, max count collection sometimes doesn't appear to stop?

On occasion, with a largish number of max_tweets and a search range way in the past, the command line utility appears to get stuck (or maybe twitter is slowing down how quickly it returns pages).

Might it be worth trying to be more defensive on how you break out of the while True collection loop? Or perhaps appending tweets to the export file as you go along (eg save in batches of every 100 or so)?

Geolocation

Hello,

Would you happen to know if your program returns a lat long (geolocation) of the twitter posts?
So far I have yet to have any strings returned under geo, but perhaps it is due to what I am searching.

Error caused by setUntil() --> Only certain dates work, others do not

When using setUntil(), only a few dates work, others return this error:
lxml.etree.ParserError: Document is empty
This is my code:

max_tweets = 20
tweetCriteria = got3.manager.TweetCriteria().setUntil("2017-02-02").setQuerySearch("bitcoin").setMaxTweets(max_tweets).setLang(Lang="en")
for i in range(max_tweets):
    tweet = got3.manager.TweetManager.getTweets(tweetCriteria)[i]
    print(tweet.text)
    print(tweet.id)
    print(tweet.username)
    print(tweet.date)

Here is a little list of dates which work or not:
2017-02-05 error
2017-02-04 error
2017-02-03 ok
2017-02-02 error
2017-02-01 ok
2017-01-31 error
2017-01-30 ok
2017-01-29 ok
2017-01-28 error
2017-01-26 error
2017-01-25 ok

Also when I change the max_tweets to > 20 I get the same error!
Anybody an idea what happens here? Many thanks in advance.

It will be great to get all tweets containing a hashtag rather than limited to a specific username.

I would be great to have tweets which contains a particular hashtag rather than limited only to username.

Problems with command shell

Hello,
I am trying to save the tweets as a csv file. But I receive just Syntaxerror in the command shell.

>>> python Exporter.py -h
SyntaxError: invalid syntax

I use Python 2.7.11

The next question is... I receive just 1 Tweet. I am a beginner with python and I dont Know wich numbers needs to be changed.

Thanks.

Double-quotes not escaped inside tweet body within CSV

For example, retrieving this tweet:

mcantor;2015-09-30 15:25;0;2;""It may take 24 hours to process your unsubscribe request" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792

Not sure if this is an issue with GetOldTweets-python or the CSV library it uses (does it use a CSV library)? But I believe double-quotes are supposed to be escaped by being repeated. So it should be

mcantor;2015-09-30 15:25;0;2;"""It may take 24 hours to process your unsubscribe request"" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792

Python 3.5 issue

Hi, is it possible to use your project under Python 3.5? My python is Python 3.5. and I have issues with some packages you used in your script. So please help me figure it out how to solve it.

Clarification:

It seems that the output ( I used: python Exporter.py --username 'barackobama' --maxtweets 100) returns tweets by other users @BarackObama instead of tweets by barackobama. Is it possible to get tweets by the user using this?

Was working fine, now when I run (from script or command line) it, nothing happens...

Didn't modify the script at all, was running it on a particular query for a while and now it won't work at all. I tried re-cloning and running from command line and from within another script on Jupyter but all it does is return ''' Searching... \n Done. Output file generated "output_got.csv". ''' for the former and nothing for the latter...

Any ideas as to what happened? I visited Twitter and I don't seem to be blocked or anything.

Most of the tweets are missing

Hi,

Unfortunately, most of the tweets are missing with query search even though "allTweets=True". I have verified this with several sources (Twitter API, from the Twitter itself from the browser).

Any ideas why?

Repeated tweets

Hi!
Thanks a lot for creating this amazing project!

When I ran the codes in Main.py, I found that the collected tweets would be repeated if the "maxtweets">30. For instance, 21st - 40 th tweets would be exactly the same as the first 20 tweets, including the time.

Am I missing some important codes so that I got the problem?

Thanks!

Not able to get tweets data

Hi Jefferson,
I'm using your code in my Python project which worked well until yesterday.
This morning I faced in a issue reported by other users too.
Basically, when I try to run this query
python Exporter.py --since 2015-02-13 --until 2015-02-19 --querysearch "technology, stock"
output is always an empty csv file. I tried to debug the problem and I have noticed a couple of points:
the first one is that your code works well if the query regards tweets posted in the last 7 days.
Is it possible that there is a Twitter restriction here?
About the second one, I hypothesize the problem is in TweetManager.py line 21; the length of json['items_html'] is equals to 0 and so nothing is saved.

Any ideas about it?

Thank you

Giordano

ASCII Codec issue

Running under Py 2.7 throws an ASCII encoding issue on the file write if it hits a character it doesn't like:

Traceback (most recent call last):
  File "Exporter.py", line 66, in <module>
    main(sys.argv[1:])
  File "Exporter.py", line 56, in main
    outputFile.write('\n%s;%s;%d;%d;"%s";%s;%s;%s;"%s";%s' % (t.username, t.date.strftime("%Y-%m-%d %H:%M"), t.retweets, t.favorites, t.text, t.geo, t.mentions, t.hashtags, t.id, t.permalink))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position ...

Urllib2 not parsing the tweets

Hello,

I have a difference between result from the url :
https://twitter.com/i/search/timeline?f=tweets&q=%20since%3A2014-01-01%20%23axa%2B%23environment&src=typd&max_position=

And the JSONResponse below:
{"min_position":"TWEET--","has_more_items":false,"items_html":"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n","new_latent_count":0,"focused_refresh_interval":30000}

Do you experience the same issue ?

It seems that it could be due to the Internet Service Provider but I dont understand why ?

Fen

cannot import name Pseudo

I'm running this in Canopy and I believe I have my search string correct, but I keep getting

ImportError: cannot import name Pseudo

I've tried installing this package manually, but I can't get past this error. I'm fairly new to Python. Is there something I'm doing wrong?

Get all tweets which mention a particular user

I am trying to get tweets which are addressed to a particular user, say Barack Obama. I tried queryseach and username method, with the parameter as " @BarackObama", but some tweets were missing in the csv file. So, could you tell me the proper way to do this?

Getting this error quite frequently: "Twitter weird response."

command executing

python Exporter.py --querysearch '@RealDonaldTrump' --since 2016-06-11 --until 2016-06-18 --maxtweets 10000000

Twitter weird response. Try to see on browser: https://twitter.com/search?q=%20since%3A2016-06-11%20until%3A2016-06-18%20%27%40realDonaldTrump%27&src=typd

Done. Output file generated "output_got.csv".

Badly chosen separator for csv

Using ';' as separator is a bad decision because lots of tweets have ';' in their text section, and .csv parsers get confused because of that.

I had scripts running for days gathering tweets (.csv) and now i want to use them with R i have my data.frames bad formatted. I've lost hours of data cleaning and until now i can't even parse this csv files!

Any idea? I can't start gathering the data again because i need it for tomorrow so i'm trying to figure a way to clean this data.

getting toptweet for a given time range

using it with --toptweet returns a lot of not really top one with 0 retweet and 0 like each with a gap of several second with the next row. How can one isolate only n most-toppest-tweet :) for each day though? instead of having to get ~560-630 tweet to find one.

because setting since and until is not enough, setting those parameter to get a monthly data of 100 obs will only return data from the first day only

Thanks!

jefferson-henrique / getoldtweets-python Goto Github PK

getoldtweets-python's People

Contributors

Stargazers

Watchers

Forkers

getoldtweets-python's Issues

Recommend Projects

Recommend Topics

Recommend Org