Giter Site home page Giter Site logo

twitter-rss-scraper-google-apps-script's Introduction

Twitter RSS Feeds Scraper - Google Apps Script

A Google Apps Script to scrape (parse) the Twitter Site and generate an RSS feed.

After Twitter closed down its public RSS feed in Summer 2013 there emerged several solutions to generate RSS feeds by either using the Twitter API or scraping the Twitter Site (e.g. RSS4Twittter or TwitRSS.me). These services however accumulated lots of users already and are suffering from the traffic generated by the users. This indeed leads to downtimes or countermeasures such as IP banning on excessive data use.

Luckily, the code from several of these projects is free so that anyone can deploy their own Twitter-to-RSS server. This project is meant for people who don't have an own server to deploy some of the other full-blown server-side solutions, e.g. from this Stack Exchange Discussion.

For this project you only need a Google account to upload the code to Google Apps Script. After doing so you will be able to access your RSS feeds through a public Google Apps URL and thus be able to add these feeds to your RSS Feed Reader.

Features

As the public Twitter site is scraped to generate the RSS feed no login to Twitter or authentication is required to use this code. This means you do not need to have a Twitter account to read Tweets through RSS. On the other hand this implies that only public Tweets can be accessed.
Currently the project only supports generating RSS feeds for a user timeline but no twitter searches or lists. There exists a similar project Twitter-RSS-Google-Apps-Script, which uses the Twitter API to connect to Twitter and generate the RSS feed. This however requires that you have a Twitter Account and are able to authenticate to use the Twitter API. The mentioned project also provides access to Twitter searches, favorites and lists besides user timelines.

How to install

Requirements

You need to have a Google account (Gmail etc.) to be able to login to Google Apps Script.

1. Install

There are two possibilities to setup the project:

  • Use this link to create a new Google Apps project with the script code.

or alternatively create a scripts project by hand:

  • Login to https://script.google.com/, go to "Start scripting" and create a new "Blank Project". Delete any code in your new project and copy & paste the content from the main-script.js file in this repository into the code editor of your newly created Google Script project. Save the file and give the project a name (e.g. "Twitter-to-RSS").

2. Deploy on the Web

Next you need to make the script accessible through a public URL. Go to the Menu "Publish" and choose "Deploy as Web App". Then edit the comment for the Project version (e.g. "Initial Setup") and save that as a new version. Choose next "Execute the app as: me ([email protected])" and below give access to anyone, i.e. choose "Who has access to the app: Anyone, even anonymous". Finally, click "Deploy" to make the script accessible. This last step will give you the public URL for the project which will look similar to this one: https://script.google.com/macros/s/AKfycbzTTuLQDAU....../exec

Authorization for Web Access

If you setup and deploy the script for the first time the last "Deploy" step above will ask for authorization. The authorization is required by Google to allow the script to access resources on the web. Access to the web is necessary to fetch and parse the Twitter site. In the first popup window click "Continue" and then "Accept" in the following window that displays the permissions required by the script.

If no other error occurred in the above steps the script is now functional. On errors please file a bug describing your steps in the issue tracker.

3. Add Script URL to your RSS Reader

Copy the URL from the previous step to your RSS Feed Reader and add the parameters for the Twitter username for which you want to parse the timeline like in the example below:
https://script.google.com/macros/s/AKfycbzTTuLQDAU....../exec***?user=ciderpunx*

Optional Parameters:

How to update

You already installed the script (see above for instructions) and would want to update it to a newer version from this repository that e.g. contains some bugfixes. To update the script you cannot use the copy link as described above in Setup as that would create a new project and result in a different public URL for your Feeds. Hence the way to update is to login to https://script.google.com/, go to "Start scripting" and choose the Project in the list of recent projects or by choosing "File -> Open" from the menu. Select the code in the script file and delete it. Then copy & paste the content from the main-script.js file in this repository into the code editor. Save the file and create a new project version by clicking "File -> Manage Versions" from the menu. That is, you enter a description for the new version (e.g. "fixes this bla bug") and choose ok. Next you can deploy this new version by choosing "Publish -> Deploy as Web App" form the menu and choosing the latest version from the drop down list. Press ok and from now the latest code will be executed when requesting the public URL of your feeds script.

Acknowledgements

The code is based on code from TwitRSS.me for which the repository can be found here. It has been extended, bugs have been fixed, new features added and it has been converted to a Google Apps Script to be easier to use.

twitter-rss-scraper-google-apps-script's People

Contributors

bmihaila avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

twitter-rss-scraper-google-apps-script's Issues

Fix the insertion of links

Http Links, hashtags and @user tags are currently inserted back into the tweet text but in a very hacky and unreliable way. They are placed at any double spaces inside a tweet.

A better way to re-insert the links into the tweet must be found. Currently this though depends on the JSON output where we do not have place markers for where a link occurred in the tweet.

DNS Error

Hi,

When I deploy the app, I get a page with:

DNS error: http://query.yahooapis.com/v1/public/yql?format=json&diagnostics=true&q=use%20%22store%3A%2F%2FkhMqnFudV9aCgu92gxsnzn%22%20as%20htmlbackagain%3B%20SELECT%20*%20FROM%20htmlbackagain%20WHERE%20url%3D%22https%3A%2F%2Ftwitter.com%2FDataSciFact%2F%3Fcount%3D100%22%20AND%20xpath%3D%22%2F%2Fli%5Bcontains(%40class%2C%20'js-stream-item')%5D%22 (line 90, file "main")

Error Report

TypeError: Cannot read property "li" from null. (line 199, file "main")

Code for (var i = 0; i < jsonTweets.li.length; i++) {

Rewrite scraping logic

The parsing logic currently traverses the JSON using Javascript code. This easily breaks when Twitter changes the structure of their site's HTML. A more robust way to parse the site would be to use CSS classes for the desired information entities (they are provided by Twitter) and use regular expressions for parsing and finding these entities in the JSON.

As an example for such regex parsing look at the code used here: http://www.labnol.org/internet/twitter-rss-feed/28149/

DNS Error

Hi,

When I deploy the app, I get a page with:

DNS error: http://query.yahooapis.com/v1/public/yql?format=json&diagnostics=true&q=use%20%22store%3A%2F%2FkhMqnFudV9aCgu92gxsnzn%22%20as%20htmlbackagain%3B%20SELECT%20*%20FROM%20htmlbackagain%20WHERE%20url%3D%22https%3A%2F%2Ftwitter.com%2FDataSciFact%2F%3Fcount%3D100%22%20AND%20xpath%3D%22%2F%2Fli%5Bcontains(%40class%2C%20'js-stream-item')%5D%22 (line 90, file "main")

Move away from Yahoo YQL

The Yahoo YQL queries seem to return no results since two days. However, trying to find the cause is difficult as it works very very rarely using the YQL Console.
Browsing through their forums and the net it seems that it happens once in a while and might be some load balancer bug as it happens only in some geographical regions.

Anyways, to not be dependent on such random outages that persist for quite some time implement the YQL queries yourself or use a different library or service and replace YQL.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.