Giter Site home page Giter Site logo

twitter-scraper's Introduction

Twitter Scraper

Go Reference

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse-engineered. No API rate limits. No tokens needed. No restrictions. Extremely fast.

You can use this library to get the text of any user's Tweets trivially.

Installation

go get -u github.com/n0madic/twitter-scraper

Usage

Authentication

Now all methods require authentication!

Login

err := scraper.Login("username", "password")

Use username to login, not email! But if you have email confirmation, use email address in addition:

err := scraper.Login("username", "password", "email")

If you have two-factor authentication, use code:

err := scraper.Login("username", "password", "code")

Status of login can be checked with:

scraper.IsLoggedIn()

Logout (clear session):

scraper.Logout()

If you want save session between restarts, you can save cookies with scraper.GetCookies() and restore with scraper.SetCookies().

For example, save cookies:

cookies := scraper.GetCookies()
// serialize to JSON
js, _ := json.Marshal(cookies)
// save to file
f, _ = os.Create("cookies.json")
f.Write(js)

and load cookies:

f, _ := os.Open("cookies.json")
// deserialize from JSON
var cookies []*http.Cookie
json.NewDecoder(f).Decode(&cookies)
// load cookies
scraper.SetCookies(cookies)
// check login status
scraper.IsLoggedIn()

Open account

If you don't want to use your account, you can try login as a Twitter app:

err := scraper.LoginOpenAccount()

Get user tweets

package main

import (
    "context"
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    err := scraper.LoginOpenAccount()
    if err != nil {
        panic(err)
    }
    for tweet := range scraper.GetTweets(context.Background(), "Twitter", 50) {
        if tweet.Error != nil {
            panic(tweet.Error)
        }
        fmt.Println(tweet.Text)
    }
}

It appears you can ask for up to 50 tweets.

Get single tweet

package main

import (
    "fmt"

    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    err := scraper.Login(username, password)
    if err != nil {
        panic(err)
    }
    tweet, err := scraper.GetTweet("1328684389388185600")
    if err != nil {
        panic(err)
    }
    fmt.Println(tweet.Text)
}

Search tweets by query standard operators

Now the search only works for authenticated users!

Tweets containing “twitter” and “scraper” and “data“, filtering out retweets:

package main

import (
    "context"
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    err := scraper.Login(username, password)
    if err != nil {
        panic(err)
    }
    for tweet := range scraper.SearchTweets(context.Background(),
        "twitter scraper data -filter:retweets", 50) {
        if tweet.Error != nil {
            panic(tweet.Error)
        }
        fmt.Println(tweet.Text)
    }
}

The search ends if we have 50 tweets.

See Rules and filtering for build standard queries.

Set search mode

scraper.SetSearchMode(twitterscraper.SearchLatest)

Options:

  • twitterscraper.SearchTop - default mode
  • twitterscraper.SearchLatest - live mode
  • twitterscraper.SearchPhotos - image mode
  • twitterscraper.SearchVideos - video mode
  • twitterscraper.SearchUsers - user mode

Get profile

package main

import (
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    scraper.LoginOpenAccount()
    profile, err := scraper.GetProfile("Twitter")
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", profile)
}

Search profiles by query

package main

import (
    "context"
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New().SetSearchMode(twitterscraper.SearchUsers)
    err := scraper.Login(username, password)
    if err != nil {
        panic(err)
    }
    for profile := range scraper.SearchProfiles(context.Background(), "Twitter", 50) {
        if profile.Error != nil {
            panic(profile.Error)
        }
        fmt.Println(profile.Name)
    }
}

Get trends

package main

import (
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    scraper := twitterscraper.New()
    trends, err := scraper.GetTrends()
    if err != nil {
        panic(err)
    }
    fmt.Println(trends)
}

Use Proxy

Support HTTP(s) and SOCKS5 proxy

with HTTP

err := scraper.SetProxy("http://localhost:3128")
if err != nil {
    panic(err)
}

with SOCKS5

err := scraper.SetProxy("socks5://localhost:1080")
if err != nil {
    panic(err)
}

Delay requests

Add delay between API requests (in seconds)

scraper.WithDelay(5)

Load timeline with tweet replies

scraper.WithReplies(true)

twitter-scraper's People

Contributors

appleboy avatar berkaltiok avatar coolestowl avatar cute-angelia avatar dependabot[bot] avatar helios2003 avatar hmsta avatar justhumanz avatar l33t7here avatar laikee99 avatar mind1949 avatar n0madic avatar petrusz avatar regynald avatar rose1988c avatar rouanth avatar tolantop avatar veetaha avatar vtgare avatar windowsdeveloperwannabe avatar xiscocapllonch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twitter-scraper's Issues

Some users are not fetched

I'm having an issue with fetching some users; i.e fetching @flockinco doesn't seem to work.

Many others I've tried do work, but this one specifically doesn't. Anything I can do to debug this?

Thanks!

sample code turns to return 400 Bad Request

Hi,

The sample code you provide on README turns to return 400 bad request since last week.
I'm running similar code on AWS EC2 instance and cron every week.

I confirm this code worked since 12/13(SUN).
Any changes on Twitter frontend API?

package main

import (
	"context"
	"fmt"
	twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
	for tweet := range twitterscraper.SearchTweets(context.Background(),
		"twitter scraper data", 50) {
		if tweet.Error != nil {
			panic(tweet.Error)
		}
		fmt.Println(tweet.Text)
	}
}

=> panic: response status: 400 Bad Request

regards,

returns same username

Both 'tweet.username' and 'tweet.inreplytostatus.username' returns same username against all scraped tweets of type 'reply'.
Is this the correct response by the API?

setBearerToken is private

func (s *Scraper) setBearerToken(token string) { s.bearerToken = token s.guestToken = "" }

setBearerToken method is private. Shouldn't it be public?

Geo Location along with a tweet

How can I get geolocation along with a tweet? because there is no such field exists in the tweet struct.
Please consider this.

Bad Guest Token Without `X-Csrf-Token`

Hello! Love the work you've done on this library. I'm hoping this breaking change can be worked around, but I'm nervous at the moment.

Beginning today, my scrapes always return Bad guest token. I copied some API calls from my browser as curl, then removed headers to match what gets set by this library. Responses started breaking when I removed the X-Csrf-Token header, leading me to believe this header is now required for guests. Without it, my curl calls get the same Bad guest token error.

So far, I haven't had any luck finding the token. I suspect it's obfuscated somewhere in one of the inline HTML <script> tags, but the amount of minification makes it tough to locate.

how to connect tweet and its reply?

Getting tweets with replies? but don't understand how to connect a tweet with its reply? Because I did not find any field in the 'Tweet' struct for the same.
Please provide a solution.

Use Advanced search endpoint to avoid rate limits

Use advanced search to avoid rate limits when fetching tweets from a ton of different accounts.

One limitation I found is that the query can have a maximum of 654 Chars(with URL encoding) or else It fails.
Example query (from:someone) since:2022-4-21

You can find this request by going to twitter advanced search, pressing search, and finding the request that has adaptive.json in chrome dev tools, then Right-click it -> Copy -> Copy as CURL (bash) and then going to postman pressing Import -> Raw -> Paste Clipboard -> Import

Img

User Tweets Endpoint Response Headers

User

Advanced Search Endpoint Response Headers

I Tested sending 200+ Requests per second and no rate limit occurred.

Advanced

404 and 400 code results

always return with 404 results and if use GetProfile become 400 bad request

probably twitter change frontend api (?)

response status: 400 Bad Request

Hi,

Great library, very useful.
I think Twitter made a change somewhere. Code which used to work to get tweets from a profile now returns "response status: 400 Bad Request".

Also other functions such as this one broke :

package main

import (
    "fmt"
    twitterscraper "github.com/n0madic/twitter-scraper"
)

func main() {
    profile, err := twitterscraper.GetProfile("twitter")
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", profile)
}

Thanks again for the work

Quotes are sometimes missing data

Sometimes, if a user has replied to a tweet by quoting it, the scraper will return truncated data for that tweet.

Here's a sample tweet showing the problem - any references to Vsauce2 are missing in the response payload.
Link: https://twitter.com/kevinlieber/status/1237110897597976576

Payload:

{
   "Hashtags":null,
   "HTML":"my new video features \u003ca href=\"https://twitter.com/NetHistorian\"\u003e@NetHistorian\u003c/a\u003e as my winkey. you should probably watch this one.",
   "ID":"1237110897597976576",
   "IsQuoted":true,
   "IsPin":false,
   "IsReply":false,
   "IsRetweet":false,
   "Likes":327,
   "PermanentURL":"https://twitter.com/kevinlieber/status/1237110897597976576",
   "Photos":null,
   "Replies":3,
   "Retweets":7,
   "Retweet":{
      "ID":"",
      "TimeParsed":"0001-01-01T00:00:00Z",
      "Timestamp":0,
      "UserID":"",
      "Username":""
   },
   "Text":"my new video features @NetHistorian as my winkey. you should probably watch this one.",
   "TimeParsed":"2020-03-09T20:19:57Z",
   "Timestamp":1583785197,
   "URLs":null,
   "UserID":"16595025",
   "Username":"kevinlieber",
   "Videos":null
}

Tweet Replies

Given the timeline of some Users, can I get ( reply text of tweets, username who replied etc.)?

Thanks.

Access retweet info?

When viewing a tweet that was retweeted, the JSON presumably contains the ID/timestamp/UserID/Username that retweeted it. Is it possible to also retrieve the original poster's UserID/timestamp/ID, and retrieve their username in a better way than regexing 'RT *:'?

How to get different size about tweet's pic?

It seems look like only provides name=large in Tweet.Photos currently.
However some pic with name=4096x4096 could provide better quality, will this scraper show more pic size in the furture?

Error trying to access Trends in twitter

Failed during accessing, twitterscraper.New()

Error:

panic: Get "https://twitter.com/i/api/2/guide.json?candidate_source=trends&cards_platform=Web-12&count=20&entity_tokens=false&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&include_blocked_by=1&include_blocking=1&include_can_dm=1&include_can_media_tag=1&include_cards=1&include_entities=true&include_ext_alt_text=true&include_ext_has_nft_avatar=1&include_ext_media_availability=true&include_ext_media_color=true&include_ext_sensitive_media_warning=true&include_followed_by=1&include_mute_edge=1&include_page_configuration=false&include_profile_interstitial_type=1&include_quote_count=true&include_reply_count=1&include_tweet_replies=false&include_user_entities=true&include_want_retweets=1&send_error_codes=true&simple_quoted_tweet=true&skip_status=1&tweet_mode=extended": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Could you leave any contact to you?

I wanna talk with you about something regarding to your repo and to do it faster and easier could you please give me any discord/twitter or smth like that to communicate?

Multiple query filters

Love the library. Implementing it in my first Go project right now. I am having the issue where I cannot use multiple filters in a query such as:

...SearchTweets(context.Background(), "NASA -filter:retweets within_time:1h",50)

Now obviously NASA is frequently tweeted about, but if I choose something like JASMY, it gives me tweets from outside the hour. I've used this same exact string literal in the Twitter search bar and it seems to function properly there. Am I missing something from your documentation? It seemed to me that we could use the exact same queries as on Twitter itself.

no required module provides package

Recently installed the package and facing the following issue:

search-tweets.go:6:5: no required module provides package github.com/n0madic/twitter-scraper: go.mod file not found in current directory or any parent directory; see 'go help modules'

'search-tweets.go' file is created by me in the parent directory outside of 'twitter-scraper' package.

Sorry, that page does not exist.

tweet, err := scraper.GetTweet("907700934486003712")
if err != nil {
	log.Println(err)
}
fmt.Println(tweet.Text)

result is error

2021/07/04 15:22:32 response status 404 Not Found: {"errors":[{"code":34,"message":"Sorry, that page does not exist."}]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xd0 pc=0x127f024]

Config Cookie and x-csrf-token

some twitter cant see content after twitter owner follow you

protected

the scraper api will show

{
  "errors": [
    {
      "message": "Not authorized to view the specified user.",
      "code": 22
    }
  ]
}

can you support a config file with Cookie and x-csrf-token, thank you

get replies for a given tweet

Is it possible to get a replies for a given tweet id?
It would be awesome if it is possible to specify a count parameter for the number of replies.

How can I get user favourites(following) list

I need to get a list of nicknames to which he is subscribed by the user's nickname.
I see that the function is currently not implemented. Do you have any ideas how this can be done?

error in getting username of tweet reply

The code "tweet.InReplyToStatus.Username" generates an error while accessing the tweet of type 'reply'.

Please check and assist to resolve the problem.
Thanks.

why?

image

I don't understand why it's reporting an error, my browser can open twitter, and after I add a cookie and XCsrfToken, I still get this result.

HashTag Scraping with metadata

While hashtag scraping how can I get metadata i.e. 'Created at' and user information(Handle, Screen Name) for each tweet?

Rate Limit Exceeded

During scraping the following error has been observed:

panic: response status 429 Too Many Requests: {"errors":[{"code":88,"message":"Rate limit exceeded."}]}

Process exiting with code: 2 signal: false

search problem

search queries like ok (from:Chako_tay OR from:Jinek_RTL)

didn't work until i removed

query = url.PathEscape(query)

in search.go file

GetTweets without replies

Hi,

Could we change the functions GetTweets and FetchTweets and a bool parameter "withReplies" ?
Right now it will fetch reply tweets as well:

q.Add("with_replies", "true")

Alternative would be to add "IsReply" to the Tweet struct, but not sure if that information is contained in the html, didn't check yet.

regards,
mo

edit
I can make a pull request with the changes, but first wanted to check with you if it would be ok.

Issues about HTML field

The latest version is perfect, i can get more content than before.
But at tw.HTML field the regular expression of user name (reUsername) is missing the digit rules, such as @RonaldAraujo939 can't be matched.

And some emoji that defined by Twitter can't be displayed. I remember that the previous version had image resources of emoji.

WX20201218-180210
WX20201218-175733

Not getting userid or name who retweeted

SearchTweets function only returns the userid or name of the person who creates the tweet. It does not return the username/id of the person who retweets

Can you provide the example use of Retweet Struct?

Thanks.

Description claims "No tokens needed", yet `api.go` contains a token

The README says in the first paragraph that no tokens are needed. What is the function of the bearerToken in api.go? Can it just be used as-is, or should it be replaced with something?

There is also a guestToken field on the Scraper, though it looks like this can be refreshed any time, so it presumably doesn't have much significance.

Getting just few tweets and no more.

Hello, I'm trying to scrape tweets of an user using FetchTweets but I have an issue. I have something like this (that is working, but limited)

maxTweetsNbr := 22500 // qty of tweets of user
tweetsNbr := 0
nextCursor := ""
okToContinue := true

for tweetsNbr < maxTweetsNbr &&  okToContinue {
    tweets, cursor, err := scraper.FetchTweets(username, 200, nextCursor)
    if err != nil {
        fmt.Println(err.Error())
    }
    for _, tweet := range tweets {
        _, err = tweetsCollection.InsertOne(context.TODO(), tweet)
        if err != nil {
            fmt.Println("Error profilesCollection.UpdateOne: ", err.Error())
            return nil, err
        }
    }
    okToContinue = err == nil && cursor != nextCursor && cursor != "" && len(tweets) > 0

    tweetsNbr += len(tweets)
    nextCursor = cursor

    fmt.Println("page ok: ", tweetsNbr)
}

Issue is that len(tweets) always in the same point is 0 (so just its pulling for example 853 tweets (of 22k approx) and then no more tweets are retrieved). I just tried with proxy and withDelay but not luck

If I try another twitter account (of 88k of tweets) the same, just is pulling for example 900 tweets and then no more (even if I retry many times)

Too many requests error on Heroku

I deployed this on Heroku and on any request I get

response status 429 Too Many Requests
2021-02-22T15:02:25.438432+00:00 app[web.1]: /tmp/build_89d618b3/main.go:77 (0x8fd6ee)
2021-02-22T15:02:25.438432+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/context.go:161 (0x8df4b9)
2021-02-22T15:02:25.438433+00:00 app[web.1]: /tmp/build_89d618b3/main.go:111 (0x8fe0a2)
2021-02-22T15:02:25.438433+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/context.go:161 (0x8df4b9)
2021-02-22T15:02:25.438433+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:83 (0x8f3349)
2021-02-22T15:02:25.438434+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/context.go:161 (0x8df4b9)
2021-02-22T15:02:25.438434+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/logger.go:241 (0x8f2470)
2021-02-22T15:02:25.438434+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/context.go:161 (0x8df4b9)
2021-02-22T15:02:25.438435+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:409 (0x8e963c)
2021-02-22T15:02:25.438435+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:367 (0x8e8d3d)
2021-02-22T15:02:25.438435+00:00 app[web.1]: /tmp/codon/tmp/cache/go1.12.17/go/src/net/http/server.go:2774 (0x6b7167)
2021-02-22T15:02:25.438436+00:00 app[web.1]: /tmp/codon/tmp/cache/go1.12.17/go/src/net/http/server.go:1878 (0x6b2d50)
2021-02-22T15:02:25.438436+00:00 app[web.1]: /tmp/codon/tmp/cache/go1.12.17/go/src/runtime/asm_amd64.s:1337 (0x459a00)

I reduced the number of tweets to 1 and still the same error.

Is this related to this package or Heroku or Gin?

Ambigious wording in Readme

"It appears you can ask for up to 50 tweets (limit ~3200 tweets)." Does this mean you can ask for 50 tweets per go for up to 3200 tweets?

how to find pinned tweets from a specific user?

I've tried GetProfile() and GetTweets() but I can't get the top pinned tweet of a user, which is normally displayed when I access the profile via browser/app.

With FetchTweets(Username,"") I get 400 Bad Request btw.

Any hints? :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.