chromedp / chromedp Goto Github PK
View Code? Open in Web Editor NEWA faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
License: MIT License
A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
License: MIT License
@clanstyles Could you provide the compiled headless Chrome Docker image you mentioned in #2? I think building in connection options for that would help people, especially if they don't have to compile canary themselves.
How would we snapshop the DOM as it was? But this doesn't give us an actual snapshot of a DOM?
func Nodes(sel interface{}, nodes *[]*cdp.Node, opts ...QueryOption) Action
func NodeIDs(sel interface{}, ids *[]cdp.NodeID, opts ...QueryOption) Action
Example in README build error
"undefined: cdp.FrameHandler"
Maybe change to "cdp.Handler"?
This is a soooooooo great job! I love it.
And I have a question: How to setup proxy?
I am running on a CPU-constrained machine so Chrome takes longer than 3s to boot up. However cdp.New always times out if Chrome hasn't booted within 3s. Could you allow setting this value? This issue is making cdp.New fail probably 50% of the time on my server.
https://github.com/knq/chromedp/blob/ddaa7bc4e77f7c8b35b9754548eaa49635cf9ade/chromedp.go#L86
$ go run main.go
2017/01/27 11:58:47 fork/exec /usr/bin/google-chrome: no such file or directory
exit status 1
Is it possible to use this tool like browsertime analogue for collecting sites performance metrics?
I have a page with javascript that logs to the console. When I navigate to the page using chromdp
with logging enabled, I can see that the Runtime.consoleAPICalled
event is being fired. I'd like to subscribe to this event, but my naive approach isn't working. Any tips?
c.Run(ctxt, cdp.ActionFunc(func(z context.Context, h cdptypes.Handler) error {
ch3 := h.Listen(cdptypes.EventRuntimeConsoleAPICalled)
go func() {
x := <-ch3
fmt.Println("\n\nGOT A THING3", x)
}()
return nil
}))
c.Run(ctxt, cdp.Navigate(file))
time.Sleep(time.Second)
I think it's caused by SendKeys
method.
panic: runtime error: slice bounds out of range
goroutine 759 [running]:
github.com/knq/chromedp.(*TargetHandler).domEvent(0xc4200a62c0, 0xb391a0, 0xc42052c0c0, 0x892140, 0xc42052ada0)
/home/pah/ws/go/src/github.com/knq/chromedp/handler.go:604 +0xf69
created by github.com/knq/chromedp.(*TargetHandler).processEvent
/home/pah/ws/go/src/github.com/knq/chromedp/handler.go:264 +0x323
exit status 2
How would we go about setting the browsers height and width?
Is there a way to upload a file via form?
I want to get all subnode in a element. use above code
func search(nodes *[]*cdptypes.Node) cdp.Tasks {
return cdp.Tasks{
cdp.Navigate(`https://www.****.com`),
cdp.Sleep(2 * time.Second),
cdp.Nodes(`#lt-center > #MOP > #odds-tbl-containers > #sc1 > #s1`, nodes),
cdp.ActionFunc(func(context.Context, cdptypes.FrameHandler) error {
return nil
}),
}
}
the nodes only have one element, the node.ChildNodeCount
= 30 but, not have any sub node in node.Children
Chrome 59 has cross-platform headless support. It allows running Chrome in a headless/server environment.
To use via the DevTools remote debugging protocol, start a normal Chrome binary with the --headless
command line flag (Linux-only for now):
$ google-chrome --headless --disable-gpu --remote-debugging-port=9222 https://www.google.fr
How can I tell chromedp
to send the --headless flag, along with other flags?
macOS Sierra 10.12.2
Chrome installed at the following location: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Running the following using the path hard-coded into chromedp/runner/path_darwin.go:
$ open "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
Successfully opens Chrome instance, so path is correct.
From within my program, I cannot allocate from pool, resulting in the error message:
fork/exec : no such file or directory
Running the unit tests results in:
--- FAIL: TestNavigate (0.00s)
chromedp_test.go:38: fork/exec : no such file or directory
FAIL
exit status 1
FAIL github.com/knq/chromedp 0.023s
Let me know if you need more details.
P.S. The new Pool feature seems bang-on!
When I attempt to use page.PrintToPDF().Do() I get the following error:
2017/06/13 16:06:52 -> {"error":{"code":-32000,"message":"PrintToPDF is not implemented"},"id":87}
2017/06/13 16:06:52 PrintToPDF is not implemented (-32000)
I've tried this against Chrome build 59, Canary and the latest build. I was able to print to pdf from the cli with:
~/Downloads/chrome-mac/Chromium.app/Contents/MacOS/Chromium --headless --disable-gpu --print-to-pdf=test.pdf http://google.com/
The network
package's SetExtraHTTPHeaders
method accepts a type of Headers
. But it is an empty struct with no fields. It seems like it should really be a map[string]interface{}
if there are no properties in the protocol definition. Otherwise, how can we set extra HTTP headers?
Some javscript frameworks like JQuery have complex drop downs. How would you click them? Example: http://labs.abeautifulsite.net/jquery-dropdown/#1
I've though of two ways, Identify the menu, "send keys", but then how do you press "down" on the arrow keys a few times? The other is X,Y but I'm not sure how you'd calculate the position of the item.
SO far in chrmeP there is not much support for other browsers yet, like Edge etc.
Murlok might be a good companion to help bridge this.
https://github.com/murlokswarm
There are drivers for webview based browsers. SO far OSX and Windows (super beta).
But Android and IOS are planned.
Murlok is golang based and concerned with allowing golang programmers to run HTML based apps easily with in webviews on all Desktops and Mobiles.
Would be curious what you think
I'm looking through the code trying to figure out how to generate a HAR archive (with Chrome) for a given URL although I can't seem to find it. Is it possible to accomplish this?
attrs := map[string]string{
"class": "foo",
"href": "bar",
}
<element class="foo" href="bar/>
<element class foo href bar/>
Also return InvalidCharacterError (-32000) when attributes name is numeric.
Is it possible to reuse chrome window that was started before my go program?
Tends to eat 400% CPU
Probably caused by https://github.com/knq/chromedp/blob/8eb44961e2c3fbb7b3b302184114c2eb2059c4b8/handler.go#L397
go version: 1.8 (ubuntu 16)
go test -v
=== RUN TestNavigate
=== RUN TestNavigationEntries
=== RUN TestNavigateToHistoryEntry
=== RUN TestNavigateBack
=== RUN TestNavigateForward
=== RUN TestStop
=== RUN TestReload
=== RUN TestCaptureScreenshot
=== RUN TestAddOnLoadScript
=== RUN TestRemoveOnLoadScript
=== RUN TestLocation
=== RUN TestTitle
=== RUN TestNodes
=== RUN TestNodeIDs
=== RUN TestFocusBlur
=== RUN TestDimensions
=== RUN TestText
=== RUN TestClear
=== RUN TestClear/test_0
=== RUN TestClear/test_1
=== RUN TestClear/test_2
=== RUN TestClear/test_3
=== RUN TestClear/test_4
=== RUN TestClear/test_5
=== RUN TestClear/test_6
=== RUN TestClear/test_7
=== RUN TestClear/test_8
=== RUN TestClear/test_9
2017/02/22 16:34:55 error: pool could not start runner on port 9000: fork/exec : no such file or directory
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e0886]
goroutine 34 [running]:
testing.tRunner.func1(0xc420108dd0)
/usr/local/go/src/testing/testing.go:622 +0x29d
panic(0x8667e0, 0xb35770)
/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/knq/chromedp.(*CDP).Wait(0x0, 0x0, 0xc420049ca8)
/home/idawson/gohome/src/github.com/knq/chromedp/chromedp.go:140 +0x26
github.com/knq/chromedp.(*Res).Release(0xc420102740, 0x0, 0x0)
/home/idawson/gohome/src/github.com/knq/chromedp/pool.go:148 +0x53
github.com/knq/chromedp.(*Pool).Allocate(0xc4200189b0, 0xb12f60, 0xc420015400, 0x0, 0x0, 0x0, 0x0, 0xb0dfa0, 0xc42010c930)
/home/idawson/gohome/src/github.com/knq/chromedp/pool.go:88 +0x58b
github.com/knq/chromedp.testAllocate(0xc420108dd0, 0x8e8405, 0x9, 0x10)
/home/idawson/gohome/src/github.com/knq/chromedp/chromedp_test.go:15 +0x75
github.com/knq/chromedp.TestClear.func1(0xc420108dd0)
/home/idawson/gohome/src/github.com/knq/chromedp/query_test.go:212 +0x6f
testing.tRunner(0xc420108dd0, 0xc420104220)
/usr/local/go/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:697 +0x2ca
exit status 2
FAIL github.com/knq/chromedp 0.008s
Hi,
I was trying to use chromedp
along with the Docker based example and had some issues, specifically getting a panic which seem to happen when trying to shut down the client.
This should reproduce the issue:
package main
import (
"context"
cdp "github.com/knq/chromedp"
"github.com/knq/chromedp/client"
)
func main() {
ctx := context.Background()
// create chrome instance
c, _ := cdp.New(ctx,
cdp.WithTargets(client.New().WatchPageTargets(ctx)),
)
// shutdown chrome
c.Shutdown(ctx)
}
Hello,
Is there a way to evaluate javascript code ? It's a fairly common task among all browser automation tools I used, but I didn't find it in chromedp (and the lack of docs didn't help either).
Thanks,
Howe
Yes, you can use chromedp like that. If you just do chromedp.New() and don't pass it a chromedp/runner.Runner instance, then it will launch a new, isolated chrome instance for you, using the default options. You could use chromedp to manage multiple tabs at once, but I would not recommend that, as there are issues/problems when chrome does rendering "off screen" (basically the tab is suspended).
Based on what you mentioned here, I think dealing with Chrome similarly to a sql database makes a lot of sense. Based on the size of the Chrome node / container, it should be able to performantly handle N instances. I think the library should accept requests and queue them up, waiting for completion. Something along these lines:
func (p *Pool) Exec (ctx context.Context, tasks cdp.Tasks) (*Response, error)
Thoughts?
Hi there, when I try the demo using ubuntu 16.04 I get the following error message.
2017/01/27 09:48:21 timeout waiting for initial target
exit status 1
I think it may be related to the library not finding chrome's executable?
When I check the path of chrome I see the following:
lucas@lucas-notebook:~/go$ ls -lah /usr/bin/google-chrome
lrwxrwxrwx 1 root root 31 abr 25 2016 /usr/bin/google-chrome -> /etc/alternatives/google-chrome
Hi,
When I use headless mode, if the tag wasn't properly shutdown, the page target will be there forever and I couldn't clear the cookie to isolate the test cases. The CloseByID method isn't implemented, and AddNewTarget is also unusable as it will panic and return some syntax error.
I'm wondering if there's a way to initialize a new target everytime before I run a test case.
Thanks,
Xindong
Hi,
Is there any way to set value in iframe and submit?
Input actions, such as SendKeys
, will appear to fail under high load at times. The explanation is below:
SendKeys
simply queues key dispatch events to Chrome, but Chrome is only queuing those events and the debugging protocol command result will return after the event has been queued, but not necessarily before the event has been processed by Chrome.The simple workaround, at this time, is to do a brief sleep after input events (ie, SendKeys
) on input fields, and, and/or not use SendKeys for setting the values of input/textarea/etc. fields.
The long term and "correct" fix would be to wait (either via Chrome's Runtime domain, or other mechanism) for Chrome to have processed the actual event. As this requires a significant amount of work, this is of lower priority at the moment.
Note: while I have not yet encountered it yet, but one could surmise that any time Chrome is under load (ie, doing animation or some such) that various input events will end up being queued and processed at a later time. As such, this is likely worth the necessary development effort to fix, but is definitely not trivial to do so.
I'm most excited to see this in conjunction with headless Chrome that is currently in canary. I don't see any reason it should work differently, since the dev tools is one of two ways to integrate, but wanted to at least put it on on table in case it affects development of the library.
https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
More Resources:
https://bugs.chromium.org/p/chromium/issues/detail?id=546953
https://news.ycombinator.com/item?id=11839303
sometimes i open a url but the url is not successful load then the script is deadlock
how to solve the problem
Hi,
I was wondering if it is possible to get the complete rendered html for a given website.
Thanks.
This is fixed by creating a directory called c:\temp, but this is not the usual tmp directory for Windows. It should be %USERPROFILE%\AppData\Local\
or user assignable.
I am looking at cdp package and no way I can find it;)
The network.SetCookie method seems to be broken.
The issue seems to be the JSON serialization of the ExpirationDate field of the SetCookieParams
struct.
According to the DevTools Protocol docs the expirationDate
parameter given to the Network.setCookie method is of type Network.Timestamp which is defined as the number of seconds since epoch but the current implementation seems to serialize it as the number of seconds since boot.
Also, the parameter should be optional but in my tests the expirationDate
field was always present in the JSON serialization of the network.setCookie
devtools call even when no value has been set to SetCookieParams.ExpirationDate
field (I haven't looked into it but I suppose this might have something to do with the Go time.Time not allowing null values).
./build.sh
go generate
qtc: 2017/01/28 21:01:41 Compiling *.qtpl template files in directory "templates"
qtc: 2017/01/28 21:01:41 Compiling "templates/domain.qtpl" to "templates/domain.qtpl.go"...
qtc: 2017/01/28 21:01:41 error when parsing file "templates/domain.qtpl": error in "func CommandTemplate(c *internal.Type, d *internal.Domain, domains []*internal.Domain)": error in "if len(c.Parameters) != 0": error in "for _, p := range c.Parameters": error in "if !p.Optional": unexpected tag found after "continue": "end" at file "templates/domain.qtpl", line 24, pos 105, token "end", last line "{% if len(c.Paramete" ... "% continue %}{% end "
main.go:3: running "qtc": exit status 1
On SendKeys
the window is focused.
Is it possible to fix it? It's impossible to bear, headless chrome doesn't work with SOCKS proxy (does it?).
2017/06/04 17:18:51 exec: "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe": file does not exist
Is it possible to use chromedp
with Chrome headless as background process ?
eg.
// https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
// --disable-gpu currently required, see link above
google-chrome --headless --hide-scrollbars --remote-debugging-port=9222 --disable-gpu &
If yes, can you explain how can I do that?
var err error
// create context
ctxt, cancel := context.WithCancel(context.Background())
defer cancel()
// create chrome instance
c, err := cdp.New(ctxt, cdp.WithLog(log.Printf), cdp.WithRunnerOptions(
runner.WindowSize(1920, 1080),
))
if err != nil {
log.Fatal(err)
}
this is my code,but it can't maximize the window,i'm not find the maximize window api,how can i maximize window?
I am having an issue with getting the chromedp.Click
action to work. It seems to work fine when running in headless mode but without headless mode the click does not register at all. Nothing happens in the browser when the click is supposedly made and chromedp just continues on to the next action in the task list.
I am running Arch Linux using LightDM display manager and bspwm window manager. The exact same program was confirmed working correctly on a coworker's machine running Linux Mint 18 Cinnamon and on both Chromium 57 and 61.
Below is some version information and the debug log from chromedp, I am using the current master branch of chromedp (commit 91303cb).
~ ▸ chromium --version
Chromium 59.0.3071.86
~ ▸ go version
go version go1.8.3 linux/amd64
~ ▸ go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/branch/go/"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build214944291=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
chromedp<- {"id":97,"method":"DOM.performSearch","params":{"query":"#loginForm\\:loginButton"}}
chromedp-> {"id":97,"result":{"searchId":"19084.12","resultCount":1}}
chromedp<- {"id":98,"method":"DOM.getSearchResults","params":{"searchId":"19084.12","fromIndex":0,"toIndex":1}}
chromedp-> {"method":"DOM.setChildNodes","params":{"parentId":65,"nodes":[{"nodeId":77,"parentId":65,"backendNodeId":79,"nodeType":1,"nodeName":"INPUT","localName":"input","nodeValue":"","childNodeCount":0,"children":[],"attributes":["id","loginForm:loginButton","type","submit","name","loginForm:loginButton","value","Sign in","style","display: block; margin: 30px auto; text-align: center; background-color: #69be28; border-color: #69be28; width: 100%;","class","btn btn-lg btn-primary"],"shadowRoots":[{"nodeId":78,"backendNodeId":80,"nodeType":11,"nodeName":"#document-fragment","localName":"","nodeValue":"","childNodeCount":1,"children":[{"nodeId":79,"parentId":78,"backendNodeId":81,"nodeType":3,"nodeName":"#text","localName":"","nodeValue":"Sign in"}],"shadowRootType":"user-agent"}]},{"nodeId":80,"parentId":65,"backendNodeId":82,"nodeType":1,"nodeName":"P","localName":"p","nodeValue":"","childNodeCount":1,"attributes":[]},{"nodeId":81,"parentId":65,"backendNodeId":83,"nodeType":1,"nodeName":"P","localName":"p","nodeValue":"","childNodeCount":1,"attributes":[]}]}}
chromedp-> {"id":98,"result":{"nodeIds":[77]}}
chromedp<- {"id":99,"method":"DOM.getBoxModel","params":{"nodeId":77}}
chromedp-> {"id":99,"result":{"model":{"content":[224,617,1006,617,1006,652,224,652],"padding":[200,602,1030,602,1030,667,200,667],"border":[199,600,1031,600,1031,668,199,668],"margin":[199,555,1031,555,1031,713,199,713],"width":555,"height":45}}}
chromedp<- {"id":100,"method":"Runtime.evaluate","params":{"expression":"(function(a) {\n\t\treturn a[0].offsetParent !== null\n\t})($x('/html[1]/body[1]/div[1]/div[2]/div[2]/form[1]/div[1]/div[3]/input[1]'))","objectGroup":"console","includeCommandLineAPI":true,"returnByValue":true}}
chromedp-> {"id":100,"result":{"result":{"type":"boolean","value":true}}}
chromedp<- {"id":101,"method":"Runtime.evaluate","params":{"expression":"(function(a) {\n\t\ta[0].scrollIntoViewIfNeeded(true);\n\t\treturn [window.scrollX, window.scrollY];\n\t})($x('/html[1]/body[1]/div[1]/div[2]/div[2]/form[1]/div[1]/div[3]/input[1]'))","objectGroup":"console","includeCommandLineAPI":true,"returnByValue":true}}
chromedp-> {"id":101,"result":{"result":{"type":"object","value":[0,0]}}}
chromedp<- {"id":102,"method":"DOM.getBoxModel","params":{"nodeId":77}}
chromedp-> {"id":102,"result":{"model":{"content":[224,617,1006,617,1006,652,224,652],"padding":[200,602,1030,602,1030,667,200,667],"border":[199,600,1031,600,1031,668,199,668],"margin":[199,555,1031,555,1031,713,199,713],"width":555,"height":45}}}
chromedp<- {"id":103,"method":"Input.dispatchMouseEvent","params":{"type":"mousePressed","x":615,"y":634,"button":"left","clickCount":1}}
chromedp-> {"id":103,"result":{}}
chromedp<- {"id":104,"method":"Input.dispatchMouseEvent","params":{"type":"mouseReleased","x":615,"y":634,"button":"left","clickCount":1}}
chromedp-> {"id":104,"result":{}}
Depending on the server, on page load the next page can be one of two. I need to be able to figure out which one that is.
If I use something like WaitVisible or QueryAfter (which pauses) it doesn't always work. I also have to wait for a page to finish loading. So #1 detecting the page loaded and #2 not knowing what page it is makes it hard.
Do you have any recommendations?
I'm making a custom action for this page's interactions specifically.
I'm trying to write a basic recursive link checker.
I've been unable to distinguish valid 200 responses from 404 and even the case when connection fails (e.g. trying to Navigate to "https://domain.not-existant-at-all.com".
In both cases I get valid dom to query (for 404 it's what website returns, for connection failure it's Chrome's "can't connect" page).
I've looked at the examples and perused the code and didn't find a way to access this information.
It looks it would require looking at EventNetworkResponseReceived and associated EventResponseReceived and it's Response.Status.
I know this is tricky, because dom tree doesn't necessarily correspond to a single http request, but in common case when it does, maybe the API could expose Response struct associated with a given frame and a way to access it?
So, I'll admit that I haven't had a chance to dig very deeply into the code/docs of chromedp much yet, but from my cursory look at the current examples, it didn't really seem like any covered one of the first use-cases for this awesome repo that came to my mind: wget --mirror
-like functionality.
Unless I've misunderstood what I've come across so far, chromedp could be great for mirroring websites much like wget -m
does (recursively requesting all links to download all assets locally), with two additional amazing benefits:
Just wanted to throw this out there to see if it might make for a valid use-case/example for chromedp. Either way, super cool repo, thanks for putting it out there :)
I'm trying to create a custom task.
Newaction(sel string, opts ...chromedp.QueryOption) chromedp.Action {
log.Println("querying...")
return chromedp.QueryAfter(sel, func(ctxt context.Context, h cdp.Handler, nodes ...*cdp.Node) error {
log.Println("something after happeed")
if len(nodes) == 0 {
return fmt.Errorf("not found...")
}
return fmt.Errorf("dsfasdf")
}, opts...)
}
QueryAfter is never called.
Trying to build my program against my fork of chromedp, get the following error:
chromedp/chromedp.go:137: cannot use h (type *TargetHandler) as type cdp.Handler in assignment:
*TargetHandler does not implement cdp.Handler (wrong type for Execute method)
have Execute(context.Context, cdp.MethodType, easyjson.Marshaler, easyjson.Unmarshaler) error
want Execute(context.Context, cdp.MethodType, easyjson.RawMessage) <-chan interface {}
Is this something currently in flux?
https://news.ycombinator.com/item?id=14101233
Hi, just wishing to know if headless support is guaranteed in relation to the aforementioned link. It's a new feature on chrome.
Thank you.
I'm using the task system to run a chunk of tasks, then check for "completion" in another function. I'm trying to wait for the page to change and data to exist on page or check to see if the Url has changed.
What's the best pattern for this? Even after you click a button, the time to wait for a new page is hard. The page has no unique elements so the current page's (WaitVisible) wont work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.