Giter Site home page Giter Site logo

integrii / headlesschrome Goto Github PK

View Code? Open in Web Editor NEW
120.0 8.0 5.0 32 KB

A Go package for working with headless Chrome. Run interactive JavaScript commands on web pages with Go and Chrome.

License: MIT License

Go 100.00%
chrome headless go testing scraper macos cli package

headlesschrome's Introduction

DO NOT USE! This project does not work anymore due to changes in Chrome. Use the Chrome DevTools protocol in chromedp instead!

headlessChrome ๐Ÿค–

Support only for Ubuntu on Docker for now. Mac appears to not be working. ๐Ÿ˜ฌ

A go package for working with headless Chrome. Run interactive JavaScript commands on pages with go and Chrome without a GUI. Includes a few helpful functions out of the box to query and click selector paths by their classes, divs, or html content.

You could use this package to click buttons and scrape content on/from a website as if you were a browser, or to render pages that wouldn't be supported by other things like phantomjs or casperjs. Especially useful for sites that use EmberJS, where the content is rendered by javascript after the HTML payload is delivered.

Examples

An example project that does some simple things with a Makefile and Dockerfile is in the examples directory.

Install

go get github.com/integrii/headlessChrome

Documentation

http://godoc.org/github.com/integrii/headlessChrome

Docker Version

To run Chrome headless with docker, check out examples/docker/main.go as well as examples/docker/Makefile. When in that directory, you can do make test to build and run the container with the example app inside. You will see the source of httpbin.org displayed at the end of the build and run.

Custom Flags

By default, we startup with the bare minimum flags necessary to start headless chrome and open a javascript console. If you want more flags, like a resolution size, or a custom User-Agent, you can specify it by replacing the Args variable. Just be sure to append to it so you don't kill the default flags...

headlessChrome.Args = append(headlessChrome.Args,"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
headlessChrome.Args = append(headlessChrome.Args,"--window-size=1024,768")
Changing the Path to Chrome

Change the path to Chrome by simply setting the headlessChrome.ChromePath variable.

headlessChrome.ChromePath = `/opt/google/chrome-unstable/chrome`
JavaScript Helper Examples

Find the full list in the docs.

// click some span element from the page by its text content
browser.ClickItemWithInnerHTML("span", "Google Search",0)

// select the content of something by its css classes
browser.GetContentOfItemWithClasses("button arrow bold",0)
time.Sleep(time.Second) // give it a second to query

// read the selected stuff from the console by picking
// the next item from the output channel
fmt.Println(<-browser.Output)

Contributing

Please send pull requests! It would be good to have support for more operating systems or more handy helpers to run more commonly used javascript code easily. Adding support for other operating systems should be as simple as checking the platform type and changing the ChromePath variable's default value.

headlesschrome's People

Contributors

integrii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

headlesschrome's Issues

How can I take take screenshot using headlessChrome

Hello,
Kindly add an example on how to take a screenshot with certain resotuion.

Here is what I've tried on my local Ubuntu machine, but get no output result:

package main

import (
	"fmt"
	"github.com/integrii/headlessChrome"
)

func main() {
	headlessChrome.ChromePath = `/usr/bin/google-chrome`

	// set some additional arguments for when starting chrome
	headlessChrome.Args = append(headlessChrome.Args, "--disable-gpu")
	headlessChrome.Args = append(headlessChrome.Args, "--screenshot")
	headlessChrome.Args = append(headlessChrome.Args, "--window-size=1280,1024")
	// make a new session
	browser, err := headlessChrome.NewBrowser(`http://httpbin.org`)
	if err != nil {
		panic(err)
	}
	// Close the browser process when this func returns
	defer browser.Exit()

	// loop over all the output that came from the ouput channel
	// and print it to the console
	for len(browser.Output) > 0 {
		fmt.Println(<-browser.Output)
	}
}

Document.documentElement.outerHTML got null

my config: i used the chrome canary

headlessChrome.ChromePath = `/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary`

I got this error on Mac OS. it seems the browser didn't start up in headless.

case <-startupTime.C:
	debug("ERROR: Browser failed to start before browser startup time cutoff")
	chromeSession.ForceClose() // force cloe the session because it failed
	err = errors.New("Chrome console failed to init in the alotted time")
	return &chromeSession, err

but i remove headless from args, the chrome could open a window right now. "--headless"

Error pty.Start

..\github.com\integrii\interactive\session.go:86:15: undefined: pty.Start

Unable to get content for some urls

I'm unable to get any page content for some urls, such as Newegg.com or facebook.com.

I simply copied the example from the readme in GitHub, and changed the url.

// make a new session browser, err := headlessChrome.NewBrowser(http://newegg.com`)
if err != nil {
panic(err)
}
// Close the browser process when this func returns
defer browser.Exit()

// sleep while content is rendered. You could replace this
// with some javascript that only returns when the
// content exists to be safer.
time.Sleep(time.Second * 5)

// Query all the HTML from the web site
browser.Write(document.documentElement.outerHTML)
time.Sleep(time.Second)

// loop over all the output that came from the output channel
// and print it to the console
for len(browser.Output) > 0 {
fmt.Println(<-browser.Output)
}`

What is causing some urls not to retrieve the page content?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.