Giter Site home page Giter Site logo

getoldtweets-python's People

Contributors

bmjr avatar bubavv avatar dorfman avatar fernandoramacciotti avatar jefferson-henrique avatar jfabdo avatar mattiasostmar avatar phaerus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

getoldtweets-python's Issues

language criteria

Hi, many thanks for this wonderful package!
I am very basic in everything including github, python, tweetscarpping, etc. So, sorry in advance if a dumb question. But, is there a way through got package to set a language criteria? there is no 'setLang' kind of a criteria in the Criteria file.

Many thanks
Shadi

No idea what the error is about :(

I have installed both lxml and pyquery on windows 10.
Used this command at the command prompt. Python installation directory is created in the path.

python Exporter.py --username 'barackobama' --maxtweets 1
File "Exporter.py", line 8
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'

Kindly help me resolve this issue.

Getting Usertimeline

While using the following query to get twitter timeline by username (python Exporter.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 0) the output file includes tweets by a different username. Upon inspecting the webpage on twitter, I believe re-tweets and mentions which appear on one's timeline are also being crawled as belonging to the user. Does the crawler differentiate between the tweets on one's timeline? Thanks

gives empty csv file

hi, there is another issue.

in some cases, your code only gives empty (not exactly empty, just header) CSV file.

I looked into it, there are 2 cases.

one is a person who hasn't tweeted for a long time. and another is maybe IP-ban cases, I used this code to crawl some organizations twitters, and gives empty CSV files sometimes.

are there any way to avoid this cases? especially the first one - long-time-no-tweet

Install

Hi, I have a vanilla Debian OS and attempting to use your script. Can you share some instructions on the packages I would need to get setup to run the script?

For example: sudo pip install pyquery

Thanks

Getting tweets for locked accounts that you follow

Example scenario: I follow a user on twitter but their account is locked. Therefore I can see their tweets but cannot use the TweetManager to export them. Do you have any strategies to allow TweetManager to authenticate into my account before getting the JsonResponse?

Get tweets by query search bound range fails

This is a very useful piece of code. Great work .
I am looking forward to use the code for one of the data mining project at school.

I downloaded the python code and ran the samples. But the Get tweets by query search/ username bound range fails

The same query works in the Java version of this project. I was wondering if I am missing something?

Basic help to use this - like a dummies example

Hi there,
I am no Python pro but would like to use this package. I've cloned the repo and navigated to the folder. I then launched it with

python Export.py --username 'barackobama' --since 2015-09-10 --until 2015-09-12 --maxtweets

I get the following error:
File "Exporter.py", line 6
print 'You must pass some parameters. Use "-h" to help.'
^
SyntaxError: Missing parentheses in call to 'print'

Can someone show me a simple example of using this?
Thanks

Tweets extraction limit?

What is the limit to the maximum number of tweets that can be extracted and what is the maximum time duration that can be chosen, I am trying to extract 1000 tweets between 1 month period (3 years old) but I am ending up with only 1 tweet. How to manipulate the code to meet the expected results?

No module named got

Hi @Jefferson-Henrique

Thanks a lot for releasing this code! It was working great till today when I started encountering ImportError: No module named got

Could you let me know how this can be fixed?

Thanks!

Not able to fetch tweets

I ran the given code yesterday and it successfully fetched all the tweets between specific dates for the mentioned tags however i have below concerns, please let me know your comments:

  1. Are these the entire tweets for the specified interval( or is there any limit for a calendar day?)
  2. When I am running the same script again with updated tags, I am not getting even a single tweet, please let me know the reason for this?

Thanks!

bound dates suddenly not working

Hello:

i tried the last example today and it does not work anymore.It returned 0 tweets. Everything was fine yesterday so do you know what the problem is?

Thank you

Get tweets by query search

Hello,
Firstly the project is perfect thank you.
I have a problem; I'm using Example 2 but as result l got only one tweet. Should I change something to get all tweets for a keyword and a given time period?

only returning tweets for last week?

I found out yesterday that Exporter.py is not giving back actual "all tweets".

for your example, I crawled twitter with username 'barackobama' without other options such as
'maxtweet', it gives the only handful of 25 tweets starting from a week ago.

maybe this code detected and blocked?

setSince and setUntil issue

I think Twitter may have determined that I had been abusing their service. Yesterday, I ran your program to collect over 20,000 tweets. It ran fine. Each call involved a setSince and setUntil date and setMaxTweets to 10.

Today I changed it setMaxTweets to 20. It worked for the first 10 calls or so and then stopped. I had modified your programming to carry out some pre and post-processing. I resorted back to original copy, latest version, and from what I can see it is not returning anyway tweets which have dates specified in the tweetCriteria. Have you ever encountered it? Any tips? I will try it from a different ip address later

The error is as follows:
Traceback (most recent call last):
File "Main.py", line 32, in
main()
File "Main.py", line 21, in main
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
IndexError: list index out of range

Ie. there are no tweets to return but I am sure that some tweets are. This is happening even on your program with the second query

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees').setSince("2015-05-01").setUntil("2015-09-30").setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]

ImportError: No module named _socket

Dear Jefferson

When I install requirments.txt, there are error "Import Error".

What can I do~?!

(venv) C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master>pip install -r requirements.txt
Traceback (most recent call last):
File "C:\Python27\Lib\runpy.py", line 174, in run_module_as_main
"main", fname, loader, pkg_name)
File "C:\Python27\Lib\runpy.py", line 72, in run_code
exec code in run_globals
File "C:\Users\seoul1\Dropbox\Mari\GetOldTweets-python-master\venv\Scripts\pip.exe__main
.py", line 5, in
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip__init__.py", line 14, in
from pip.utils import get_installed_distributions, get_prog
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\utils__init__.py", line 22, in
from pip.compat import console_to_str, expanduser, stdlib_pkgs
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat__init__.py", line 13, in
from pip.compat.dictconfig import dictConfig as logging_dictConfig
File "c:\users\seoul1\dropbox\mari\getoldtweets-python-master\venv\lib\site-packages\pip\compat\dictconfig.py", line 22, in
import logging.handlers
File "C:\Python27\Lib\logging\handlers.py", line 26, in
import errno, logging, socket, os, cPickle, struct, time, re
File "C:\Python27\Lib\socket.py", line 47, in
import _socket
ImportError: No module named _socket

querysearch not working

Im running this code
python Exporter.py --querysearch 'europe refugees' --maxtweets 10

and got this error

Done. Output file generated "output_got.csv".
Traceback (most recent call last):
  File "Exporter.py", line 79, in <module>
    main(sys.argv[1:])
  File "Exporter.py", line 70, in main
    got.manager.TweetManager.getTweets(tweetCriteria, receiveBuffer)
  File "C:\Users\RAHUL_MSI\Anaconda2\lib\site-packages\got\manager\TweetManager.py", line 18, in getTweets
    if (tweetCriteria.username.startswith("\'") or tweetCriteria.username.startswith("\"")) and (tweetCriteria.username.
endswith("\'") or tweetCriteria.username.endswith("\"")):
AttributeError: TweetCriteria instance has no attribute 'username'

Working with Python 3.5?

Hi,

first of all thank you for creating and sharing this module. I'm working on a project that requires specific tweets, and getting them only in the current week would take way to long, so I tried to take advantage of your solution. I'm using Python 3.5, and I unfortunately can't get it to work. I've got the requirements installed and I've also changed all of the print statements to match the new standard.

If I copy the contents of the got3 folder to the main directory and run the exporter, I receive the following error: "attempted relative import beyond top-level package". The line that's causing this problem is the following: "from .. import models" (in TweetManager).

Has anyone been able to use the script with Python 3.5 and would please help me out to get it working? Thanks in advance.

Location info

Hello,

I try to generate tweets that include user locations, but tweet.geo returns an empty string. And this is for users with location turned on. Any help?

Issue with fetching user timeline for longer periods

Hey,
I've been facing these two issues consistently. Any help in fixing/understanding these will be highly appreciated.

  1. Tweets that appear from query search results are missing in the corresponding user timelines. This happens even after setting the topTweets to "False".
  2. This issue has previously been posted. Getting user timeline for a longer time period, say 2-3 years results in an empty .csv file. Sometimes, it runs for a couple of months and terminates. Is this an ISP issue? I've tried reducing the time period to a month, a day but it still fails.

another dummies question

Hey there, i'm new to python but i'd like to use this package and learn with experience.

I typed python Exporter.py --username "barackobama" --since 2015-06-01 --until 2016-02-01 --maxtweets 1000\n

but i get this error:

Traceback (most recent call last):
File "Exporter.py", line 79, in
main(sys.argv[1:])
File "Exporter.py", line 75, in main
outputFile.close()
UnboundLocalError: local variable 'outputFile' referenced before assignment

What have i not done / done wrongly? Sorry if this is a beginner's question. Appreciate your help.

querysearch problem

Thanks for your job, Jefferson. I really need it right now. but it does not work normally in commend line.

When I input Exporter.py --querysearch '#microsoft' --since 2016-01-01 --until 2016-01-31 --maxtweets 0

I got part of the tweets not all of that, usually less than two days period ending in 2016-01-31. Can you help with it.

From command line, max count collection sometimes doesn't appear to stop?

On occasion, with a largish number of max_tweets and a search range way in the past, the command line utility appears to get stuck (or maybe twitter is slowing down how quickly it returns pages).

Might it be worth trying to be more defensive on how you break out of the while True collection loop? Or perhaps appending tweets to the export file as you go along (eg save in batches of every 100 or so)?

Geolocation

Hello,

Would you happen to know if your program returns a lat long (geolocation) of the twitter posts?
So far I have yet to have any strings returned under geo, but perhaps it is due to what I am searching.

Error caused by setUntil() --> Only certain dates work, others do not

When using setUntil(), only a few dates work, others return this error:
lxml.etree.ParserError: Document is empty
This is my code:

max_tweets = 20
tweetCriteria = got3.manager.TweetCriteria().setUntil("2017-02-02").setQuerySearch("bitcoin").setMaxTweets(max_tweets).setLang(Lang="en")
for i in range(max_tweets):
    tweet = got3.manager.TweetManager.getTweets(tweetCriteria)[i]
    print(tweet.text)
    print(tweet.id)
    print(tweet.username)
    print(tweet.date)

Here is a little list of dates which work or not:
2017-02-05 error
2017-02-04 error
2017-02-03 ok
2017-02-02 error
2017-02-01 ok
2017-01-31 error
2017-01-30 ok
2017-01-29 ok
2017-01-28 error
2017-01-26 error
2017-01-25 ok

Also when I change the max_tweets to > 20 I get the same error!
Anybody an idea what happens here? Many thanks in advance.

Problems with command shell

Hello,
I am trying to save the tweets as a csv file. But I receive just Syntaxerror in the command shell.

>>> python Exporter.py -h
SyntaxError: invalid syntax

I use Python 2.7.11

The next question is... I receive just 1 Tweet. I am a beginner with python and I dont Know wich numbers needs to be changed.

Thanks.

Double-quotes not escaped inside tweet body within CSV

For example, retrieving this tweet:

mcantor;2015-09-30 15:25;0;2;""It may take 24 hours to process your unsubscribe request" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792

Not sure if this is an issue with GetOldTweets-python or the CSV library it uses (does it use a CSV library)? But I believe double-quotes are supposed to be escaped by being repeated. So it should be

mcantor;2015-09-30 15:25;0;2;"""It may take 24 hours to process your unsubscribe request"" that's okay, it only takes 24 milliseconds to click the Report Spam button.";;;;"649304177638817792";https://twitter.com/mcantor/status/649304177638817792

Python 3.5 issue

Hi, is it possible to use your project under Python 3.5? My python is Python 3.5. and I have issues with some packages you used in your script. So please help me figure it out how to solve it.

Clarification:

It seems that the output ( I used: python Exporter.py --username 'barackobama' --maxtweets 100) returns tweets by other users @BarackObama instead of tweets by barackobama. Is it possible to get tweets by the user using this?

Was working fine, now when I run (from script or command line) it, nothing happens...

Didn't modify the script at all, was running it on a particular query for a while and now it won't work at all. I tried re-cloning and running from command line and from within another script on Jupyter but all it does is return ''' Searching... \n Done. Output file generated "output_got.csv". ''' for the former and nothing for the latter...

Any ideas as to what happened? I visited Twitter and I don't seem to be blocked or anything.

Most of the tweets are missing

Hi,

Unfortunately, most of the tweets are missing with query search even though "allTweets=True". I have verified this with several sources (Twitter API, from the Twitter itself from the browser).

Any ideas why?

Repeated tweets

Hi!
Thanks a lot for creating this amazing project!

When I ran the codes in Main.py, I found that the collected tweets would be repeated if the "maxtweets">30. For instance, 21st - 40 th tweets would be exactly the same as the first 20 tweets, including the time.

Am I missing some important codes so that I got the problem?

Thanks!

Not able to get tweets data

Hi Jefferson,
I'm using your code in my Python project which worked well until yesterday.
This morning I faced in a issue reported by other users too.
Basically, when I try to run this query
python Exporter.py --since 2015-02-13 --until 2015-02-19 --querysearch "technology, stock"
output is always an empty csv file. I tried to debug the problem and I have noticed a couple of points:
the first one is that your code works well if the query regards tweets posted in the last 7 days.
Is it possible that there is a Twitter restriction here?
About the second one, I hypothesize the problem is in TweetManager.py line 21; the length of json['items_html'] is equals to 0 and so nothing is saved.

Any ideas about it?

Thank you

Giordano

ASCII Codec issue

Running under Py 2.7 throws an ASCII encoding issue on the file write if it hits a character it doesn't like:

Traceback (most recent call last):
  File "Exporter.py", line 66, in <module>
    main(sys.argv[1:])
  File "Exporter.py", line 56, in main
    outputFile.write('\n%s;%s;%d;%d;"%s";%s;%s;%s;"%s";%s' % (t.username, t.date.strftime("%Y-%m-%d %H:%M"), t.retweets, t.favorites, t.text, t.geo, t.mentions, t.hashtags, t.id, t.permalink))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position ...

Urllib2 not parsing the tweets

Hello,

I have a difference between result from the url :
https://twitter.com/i/search/timeline?f=tweets&q=%20since%3A2014-01-01%20%23axa%2B%23environment&src=typd&max_position=

And the JSONResponse below:
{"min_position":"TWEET--","has_more_items":false,"items_html":"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n","new_latent_count":0,"focused_refresh_interval":30000}

Do you experience the same issue ?

It seems that it could be due to the Internet Service Provider but I dont understand why ?

Fen

cannot import name Pseudo

I'm running this in Canopy and I believe I have my search string correct, but I keep getting

ImportError: cannot import name Pseudo

I've tried installing this package manually, but I can't get past this error. I'm fairly new to Python. Is there something I'm doing wrong?

Get all tweets which mention a particular user

I am trying to get tweets which are addressed to a particular user, say Barack Obama. I tried queryseach and username method, with the parameter as " @BarackObama", but some tweets were missing in the csv file. So, could you tell me the proper way to do this?

Badly chosen separator for csv

Using ';' as separator is a bad decision because lots of tweets have ';' in their text section, and .csv parsers get confused because of that.

I had scripts running for days gathering tweets (.csv) and now i want to use them with R i have my data.frames bad formatted. I've lost hours of data cleaning and until now i can't even parse this csv files!

Any idea? I can't start gathering the data again because i need it for tomorrow so i'm trying to figure a way to clean this data.

getting toptweet for a given time range

using it with --toptweet returns a lot of not really top one with 0 retweet and 0 like each with a gap of several second with the next row. How can one isolate only n most-toppest-tweet :) for each day though? instead of having to get ~560-630 tweet to find one.

because setting since and until is not enough, setting those parameter to get a monthly data of 100 obs will only return data from the first day only

How to use it?

May I ask how to use these codes? I have no background in programming, but I'm doing a project on this topic. So after I unzip these files, which one I should use. The "main.py"? Can anyone help me with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.