Giter Site home page Giter Site logo

twitterscraper's People

Contributors

martinbeckut avatar martinkbeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

twitterscraper's Issues

scraper exception

import os
os.environ["http_proxy"] = "http://127.0.0.1:56916"
os.environ["https_proxy"] = "http://127.0.0.1:56916"
import snscrape.modules.twitter as sntwitter
from transformers import pipeline
import pandas as pd
from tqdm import tqdm
#抓取某一用户数据

# Creating list to append tweet data 
tweets_list1 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:QCompounding').get_items()): # CharlieMunger00 Mayhem4Markets QCompounding
    if i>20: #number of tweets you want to scrape
        break
    tweets_list1.append([tweet.date,  tweet.content, tweet.user.username, tweet.likeCount, tweet.user.displayname, tweet.lang,tweet.hashtags,tweet.mentionedUsers,tweet.inReplyToUser,tweet.quotedTweet,tweet.retweetedTweet,tweet.media])
    
# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime',  'Text', 'Username', 'Like Count', 'Display Name', 'Language','hashtags','mentionedUsers','inReplyToUser','quotedTweet','retweetedTweet','media'])

tf=tweets_df1[tweets_df1['inReplyToUser'].isnull()]

from urllib.request import urlretrieve
tf=tweets_df1[tweets_df1['media'].isnull()==False]
for i in range(tf.shape[0]):
    try:
        kk=str(i)+'i'
        urlretrieve(tf.iloc[i,-1][0].fullUrl, "d:/data/photo2/{}.jpg".format(kk))  
    except:
        continue

`File "e:\temp\ipykernel_16024\2936908550.py", line 14, in <cell line: 14>
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:QCompounding').get_items()): # CharlieMunger00 Mayhem4Markets QCompounding

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 680, in get_items
for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', params, paginationParams, cursor = self._cursor):

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 369, in _iter_api_data
obj = self._get_api_data(endpoint, reqParams)

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 338, in _get_api_data
self._ensure_guest_token()

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\modules\twitter.py", line 301, in _ensure_guest_token
r = self._get(self._baseUrl if url is None else url, headers = {'User-Agent': self._userAgent}, responseOkCallback = self._check_guest_token_response)

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\base.py", line 216, in _get
return self._request('GET', *args, **kwargs)

File "D:\anaconda3\envs\tensorflow\lib\site-packages\snscrape\base.py", line 212, in _request
raise ScraperException(msg)

ScraperException: 4 requests to https://twitter.com/search?f=live&lang=en&q=from%3AQCompounding&src=spelling_expansion_revert_click failed, giving up.`

Filtering RTs and Replies

Hi Martin,

Thank you so much for your Medium blog on this tool. This tool is super useful, and you did a great job describing how to use snscrape. I am just curious, do you know if you can filter retweets and replies with this module? Or if there is a way to know if the Tweet you are getting back is a RT, a reply, a part of a thread, etc.

Thanks so much in advance.

JB

Scraping irrelevant tweets

Hi, for some reason I am getting a lot of irrelevant tweets when I run this code.
I've got the dataframe set up to show which keyword was used to scrape the tweet. I get a wall of relevant tweets from each user with the keyword listed, and then a whole bunch of irrelevant tweets for which keyword column is blank. Can anybody tell why?

# Imports

import snscrape.modules.twitter as sntwitter
import pandas as pd
# Query by text search
# Setting variables to be used below

maxTweets = 500

# Creating list to append tweet data to
tweets_list2 = []

# Creating lists from SearchWords and TwitterHandles txt files:

keywords_list = open("SearchWords.txt", mode='r', encoding='utf-8').read().splitlines()
users_list = open("TwitterHandles.txt", mode='r', encoding='utf-8').read().splitlines()

# Using TwitterSearchScraper to scrape data and append tweets to list

for n, k in enumerate(users_list):
    for m, j in enumerate(keywords_list):
        for i,tweet in enumerate(sntwitter.TwitterSearchScraper('{} from:{} since:2020-07-07 until:2021-07-07'.format(keywords_list[m], users_list[n])).get_items()):
            if i>maxTweets:
                break
            tweets_list2.append([tweet.url, tweet.date, tweet.id, tweet.content, tweet.user.username, tweet.retweetedTweet, keywords_list[m]])

# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['URL', 'Datetime', 'Tweet Id', 'Text', 'Username', 'Retweet', 'Keywords'])

# Display first 5 entries from dataframe
tweets_df2.head()

# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets9.csv', sep=',', index=False)

Query conversationID

Hello, thank you very much for this tutorial, it's great. I would like to ask you how I can make a query based on the code of a particular conversation, with the variable conversationID?
Thanks in advance.
good day.

Geocode and since for extracting tweets

Hi, I'm trying to extract tweets combining the geocode filter and the since filter but every time I run it I end up having this error: 'Unable to find guest token'. I've run this same search using Tweepy and I do get a lot of tweets but because of the time constraint I'm very interested in making it run with this scraper. Do you know why could this be happening?

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('covid geocode:"34.052235,-118.243683,10km" since:2021-12-24').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.url,tweet.date, tweet.id, tweet.content,
                         tweet.user.username,tweet.replyCount,tweet.retweetCount,
                         tweet.likeCount,tweet.quoteCount,tweet.source,tweet.media,
                         tweet.retweetedTweet,tweet.mentionedUsers])
print('Complete')

Also, if I want to append the coordinates to the dataframe or the country/city, what attribute of tweet. should I use?

Thanks a lot!

Question about scraping last day of the month

Hi! First of all, thank you very much for this tool, this is helping me with my dissertation so much!
I'm having a problem while scraping tweets about a certain topic for a specified period of time. The problem is that when I try to get tweets like from 01/05/2019 to 31/05/2019 I only get tweets up to the 30th.
For my dissertation I needed 10k tweets a day for the past 3 years, I built a function and it worked perfectly, I extracted around 300k tweets a month but in the end, I found out that I always miss the last day of each month.
In order to add all the missing days, I need to use the tool to download tweets from the last day of each month, but from what I understood I always specify a "since" date and an "until" date. So for getting 1st May tweets I need to specify from 01/05/19 to 02/05/19 and I will get tweets from the 1st of May. The problem is that if I am dealing with the 31st of May I cannot specify an until date as it would be the 32nd that doesn't exist..
Am I missing something? How can I just get tweets from a specific day?
ps. if I set the same date as since and until date it doesn't work
Thank you in advance

Question about scraping multiple users from a list

Hi! Thanks for your super helpful Jupyter Notebook and Medium tutorial. Really appreciate the time and effort you put into this! Quick question, how do you scrape multiple users in a list? I would ideally like to iterate through a list of usernames and use your code below:

`# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])`

I tried to iterate through my list like below, but I think I'm doing something wrong.

list = [user1, user 2, user3....]
i = 0
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('from:list[i]').get_items()):

Would appreciate any advice! Thank you :)

Limit of quantity of tweets with snscrape

Hi Martin
Thank you for the good work you are doing.

I was wondering if there is a limit to the number of tweets one can scrape with snscrape. What about the date, any limit?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.