Giter Site home page Giter Site logo

mjunaidi / readable-proxy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from n1k0/readable-proxy

0.0 2.0 0.0 58 KB

Node proxy server attempting to fetch readable contents from any provided URL.

Home Page: http://readable-proxy.herokuapp.com/

JavaScript 84.15% HTML 15.22% CSS 0.63%

readable-proxy's Introduction

readable-proxy

Build Status Dependency Status

Proxy server to retrieve a readable version of any provided url, powered by Node, PhantomJS and Readability.js.

Installation

$ git clone https://github.com/n1k0/readable-proxy
$ cd readable-proxy
$ npm install

Run

Starts server on localhost:3000:

$ npm start

Note about CORS: by design, the server will allow any origin to access it, so browsers can consume it from pages hosted on a different domain.

Configuration

By default, the proxy server will use the Readability.js version it ships with; to override this, you can set the READABILITY_LIB_PATH environment variable to the absolute path to the library file on your local system:

$ READABILITY_LIB_PATH=/path/to/my/own/version/of/Readability.js npm start

Usage

Web UI

Just head to http://localhost:3000/, enter some URL and start enjoying both original and readable renderings side by side.

REST/JSON API

The HTTP Rest API is available under /api.

Disclaimer: Truly REST implementation is probably far from being considered achieved.

GET /api/get

Required parameters
  • url: The URL to retrieve retrieve readable contents from, eg. https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/.
Optional parameters
  • sanitize: A boolean string to enable HTML sanitization (valid truthy boolean strings: "1", "on", "true", "yes", "y"; everything else will be considered falsy):
  • userAgent: A custom User Agent string. By default, it will use the PhantomJS one.

Note: Enabling contents sanitization loses Readability.js specific HTML semantics, though is probably safer for users if you plan to publish retrieved contents on a public website.

Example

Content sanitization enabled:

$ curl http://0.0.0.0:3000/api/get\?sanitize=y&url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
{
  "byline":"Nicolas Perriault —",
  "content":"<p><strong>So finally you&#39;re <a href=\"https://nicolas.perriault.net/code/2013/testing-frontend-javascript-code-using-mocha-chai-and-sinon/\">testing",
  "length":2867,
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "uri":"https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/",
  "isProbablyReaderable": true
}

Content sanitization disabled (default):

$ curl http://0.0.0.0:3000/api/get\?url\=https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/
{
  "byline":"Nicolas Perriault —",
  "content":"<div id=\"readability-page-1\" class=\"page\"><section class=\"\">\n<p><strong>So finally you're…",
  "length":3851,
  "title":"Get your Frontend JavaScript Code Covered | Code",
  "uri":"https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/",
  "isProbablyReaderable": true
}

Note: the isProbablyReaderable property tells if Readability has determined if page contents were parseable or not.

Usage from node

scrape() function

The scrape function scrapes a URL and returns a Promise with the JSON result object described above:

var scrape = require("readable-proxy").scrape;
var url = "https://nicolas.perriault.net/code/2013/get-your-frontend-javascript-code-covered/";

scrape(url, {sanitize: true, userAgent: "My custom User-Agent string"})
  .then(console.error.log(console))
  .catch(console.error.bind(console));

Tests

$ npm test

License

MPL 2.0.

readable-proxy's People

Contributors

n1k0 avatar almet avatar nickolay avatar salty-horse avatar mattblaha avatar

Watchers

mjunaidi avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.