Giter Site home page Giter Site logo

headless_browser's Introduction

Headless browser based on WebKit

This tool will help you make your AJAX applications crawlable.

Webpages based on JavaScript MVC libraries can't be positioned by default because search engines can't run (yet) all the JavaSript code that your page needs to execute in order to show anything. That's why you need a headless browser that will fetch the page, run the JavaSript and output the resulting HTML to the crawler, which will then be able to index your page.

The entire project is under 50 lines of code so you won't have any trouble understanding it. It's based on Qt4 so it should work on all operating systems. There's also a small example written in PHP, but you're free to use it however you want.

There's one small thing that you should know about this headless browser. It will output everything as UTF-8. Most of the time this works like a charm, but there are also a lot of pages that don't use any encoding, or set one encoding in the headers, but the content has another. Most browsers try to guess what encoding are using the pages you make it open, and that's ok with most pages, but there are some exceptions. And that's why I don't want to play the same game of trying to guess the encoding. Instead I'm leaving that part to you, dear developer :)

TIP: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

###Install

This project relies heavily on Socket-IPC. To get this project and the required submodules you'll need git. Make sure to run this:

git clone git://github.com/alexandernst/headless_browser.git headless_browser
cd headless_browser
git submodule init
git submodule update

headless_browser's People

Contributors

alexandernst avatar perroverd avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.