Giter Site home page Giter Site logo

integrii / headlesschrome Goto Github PK

View Code? Open in Web Editor NEW
120.0 8.0 5.0 32 KB

A Go package for working with headless Chrome. Run interactive JavaScript commands on web pages with Go and Chrome.

License: MIT License

Go 100.00%
chrome headless go testing scraper macos cli package

headlesschrome's Introduction

DO NOT USE! This project does not work anymore due to changes in Chrome. Use the Chrome DevTools protocol in chromedp instead!

headlessChrome 🤖

Support only for Ubuntu on Docker for now. Mac appears to not be working. 😬

A go package for working with headless Chrome. Run interactive JavaScript commands on pages with go and Chrome without a GUI. Includes a few helpful functions out of the box to query and click selector paths by their classes, divs, or html content.

You could use this package to click buttons and scrape content on/from a website as if you were a browser, or to render pages that wouldn't be supported by other things like phantomjs or casperjs. Especially useful for sites that use EmberJS, where the content is rendered by javascript after the HTML payload is delivered.

Examples

An example project that does some simple things with a Makefile and Dockerfile is in the examples directory.

Install

go get github.com/integrii/headlessChrome

Documentation

http://godoc.org/github.com/integrii/headlessChrome

Docker Version

To run Chrome headless with docker, check out examples/docker/main.go as well as examples/docker/Makefile. When in that directory, you can do make test to build and run the container with the example app inside. You will see the source of httpbin.org displayed at the end of the build and run.

Custom Flags

By default, we startup with the bare minimum flags necessary to start headless chrome and open a javascript console. If you want more flags, like a resolution size, or a custom User-Agent, you can specify it by replacing the Args variable. Just be sure to append to it so you don't kill the default flags...

headlessChrome.Args = append(headlessChrome.Args,"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
headlessChrome.Args = append(headlessChrome.Args,"--window-size=1024,768")
Changing the Path to Chrome

Change the path to Chrome by simply setting the headlessChrome.ChromePath variable.

headlessChrome.ChromePath = `/opt/google/chrome-unstable/chrome`
JavaScript Helper Examples

Find the full list in the docs.

// click some span element from the page by its text content
browser.ClickItemWithInnerHTML("span", "Google Search",0)

// select the content of something by its css classes
browser.GetContentOfItemWithClasses("button arrow bold",0)
time.Sleep(time.Second) // give it a second to query

// read the selected stuff from the console by picking
// the next item from the output channel
fmt.Println(<-browser.Output)

Contributing

Please send pull requests! It would be good to have support for more operating systems or more handy helpers to run more commonly used javascript code easily. Adding support for other operating systems should be as simple as checking the platform type and changing the ChromePath variable's default value.

headlesschrome's People

Contributors

integrii avatar

Stargazers

 avatar  avatar Roman TSovanyan avatar Tuan Duc Tran avatar  avatar Fools avatar Maksim Sadakov avatar Zedro avatar Jonathan Phan avatar Cyan Tarek avatar Carlos Rufo avatar Mahan avatar Alexander I.Grafov avatar Sebastian Döll avatar lgq avatar Jacky avatar Oleg Pykhalov avatar Karl Moad avatar Kevin Manley avatar Anand Dhillon avatar Sam Walsh avatar  avatar Mathieu avatar harmy avatar han2015 avatar onetonfoot avatar Justin Tout avatar Krzysztof Antczak avatar Byungjik Roh avatar Tudor Filipovici avatar Matteo Guarnerio avatar Shaon Diwakar avatar bagher sohrabi avatar Youngkyoung Lee avatar David LOIRET avatar Lubomir Anastasov avatar Yota Toyama avatar iamstone avatar jomix avatar fang duan avatar Sin-Woo Bang avatar Kaspars avatar Quang Le avatar John Bolliger avatar Jens Vanderhaeghe avatar David Torres avatar Joseph Pack avatar Scott Stensland avatar wppurking avatar Andy Walker avatar Peter Siegesmund avatar Jake Martin avatar farhad avatar 进击的皇虫 avatar Emil V avatar Alex Simion avatar Marceau Lecomte avatar John Greer avatar  avatar Jani Viherväs avatar Mardari Dorel avatar Radu Topala avatar Allen Chen Jinlong avatar Simon Escobar Benitez avatar limengwei avatar Kevin De Asis avatar Zachary Russell avatar Teruo Kunihiro avatar Chris Witko avatar Jacky Tang avatar Scott Tiger avatar Yuvaraj L avatar snail007 avatar Marcin Lerka avatar Kento Yamashita avatar Gerson Alexander Pardo Gamez avatar Miki Oracle avatar Ezequiel Maraschio avatar Dev avatar Manu Chambon avatar kbinani avatar Denis Denisov avatar Ashkan Nourzadeh avatar Chris Hart avatar Bjørn Erik Pedersen avatar ryosan-470 avatar yasenn avatar Dmitry Kulikov avatar  avatar ZhaoBin avatar  avatar José Dulanto avatar Brandon Ashworth avatar Ria avatar  avatar Kent Gruber avatar Owen Hael avatar Anthony Scalisi avatar Tim Schuster avatar Adam Friedman avatar

Watchers

 avatar Allen Chen Jinlong avatar James Cloos avatar Andy Walker avatar John Greer avatar  avatar  avatar  avatar

headlesschrome's Issues

Error pty.Start

..\github.com\integrii\interactive\session.go:86:15: undefined: pty.Start

How can I take take screenshot using headlessChrome

Hello,
Kindly add an example on how to take a screenshot with certain resotuion.

Here is what I've tried on my local Ubuntu machine, but get no output result:

package main

import (
	"fmt"
	"github.com/integrii/headlessChrome"
)

func main() {
	headlessChrome.ChromePath = `/usr/bin/google-chrome`

	// set some additional arguments for when starting chrome
	headlessChrome.Args = append(headlessChrome.Args, "--disable-gpu")
	headlessChrome.Args = append(headlessChrome.Args, "--screenshot")
	headlessChrome.Args = append(headlessChrome.Args, "--window-size=1280,1024")
	// make a new session
	browser, err := headlessChrome.NewBrowser(`http://httpbin.org`)
	if err != nil {
		panic(err)
	}
	// Close the browser process when this func returns
	defer browser.Exit()

	// loop over all the output that came from the ouput channel
	// and print it to the console
	for len(browser.Output) > 0 {
		fmt.Println(<-browser.Output)
	}
}

Document.documentElement.outerHTML got null

my config: i used the chrome canary

headlessChrome.ChromePath = `/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary`

I got this error on Mac OS. it seems the browser didn't start up in headless.

case <-startupTime.C:
	debug("ERROR: Browser failed to start before browser startup time cutoff")
	chromeSession.ForceClose() // force cloe the session because it failed
	err = errors.New("Chrome console failed to init in the alotted time")
	return &chromeSession, err

but i remove headless from args, the chrome could open a window right now. "--headless"

Unable to get content for some urls

I'm unable to get any page content for some urls, such as Newegg.com or facebook.com.

I simply copied the example from the readme in GitHub, and changed the url.

// make a new session browser, err := headlessChrome.NewBrowser(http://newegg.com`)
if err != nil {
panic(err)
}
// Close the browser process when this func returns
defer browser.Exit()

// sleep while content is rendered. You could replace this
// with some javascript that only returns when the
// content exists to be safer.
time.Sleep(time.Second * 5)

// Query all the HTML from the web site
browser.Write(document.documentElement.outerHTML)
time.Sleep(time.Second)

// loop over all the output that came from the output channel
// and print it to the console
for len(browser.Output) > 0 {
fmt.Println(<-browser.Output)
}`

What is causing some urls not to retrieve the page content?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.