Giter Site home page Giter Site logo

scraper's Introduction

Scraper

Scraper is a simple webscraper built using node.js, phantomjs, and phantom.

Scraper is based on code found in the tutorial Screen Scraping with Node.js, which provides some background to web scraping and essentially explains the foundations of the code line by line. Definitely give it a read before getting started.

Basically, the best way to scrape a dynamic internet built with javascript is by using tools built in javascript that can imitate the way web browsers render the content of an increasing number of dynamic pages.

After you get node installed, just run these commands.

$ git clone [email protected]:selbyk/scraper.git
$ cd scraper
$ npm install
$ node app.js

Install and/or debug until you get this output instead of a sea of errors:

$ node app.js
opened site?  success
{ h2: [ 'Article 1', 'Article 2', 'Article 3' ],
  p:
   [ 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
     'Ut sed nulla turpis, in faucibus ante. Vivamus ut malesuada est. Curabitur vel enim eget purus pharetra tempor id in tellus.',
     'Curabitur euismod hendrerit quam ut euismod. Ut leo sem, viverra nec gravida nec, tristique nec arcu.' ] }
$

Don't know where this project is going, but I have some ideas if you're open to collaboration. You can find me in #sentiment on chat.freenode.net, by e-mail, or can stalk me down any other way.

Just don't show up at my apartment at 3 AM. Not cool, bro.

-Selby Kendrick

scraper's People

Contributors

selbyk avatar

Watchers

 avatar  avatar

Forkers

keletaire

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.