Giter Site home page Giter Site logo

Comments (15)

kenshaw avatar kenshaw commented on May 28, 2024 1

Because the package has to enable it for the base actions to work out of the box. As such, there's no need for an additional call.

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

@edoardottt I'd need to see the rest of the script to really understand what you're doing. I quickly wrote this just now, which properly returns the javascript value:

package main

import (
	"context"
	"flag"
	"fmt"
	"os"

	"github.com/chromedp/cdproto/network"
	"github.com/chromedp/chromedp"
)

func main() {
	urlstr := flag.String("url", "https://google.com/", "url")
	flag.Parse()
	if err := run(context.Background(), *urlstr); err != nil {
		fmt.Fprintf(os.Stderr, "error: %v\n", err)
		os.Exit(1)
	}
}

func run(ctx context.Context, urlstr string) error {
	ctx, cancel := chromedp.NewContext(ctx)
	defer cancel()

	const script = `"a string"`

	headers := network.Headers{
		"x-header": "a header",
	}
	var res string
	err := chromedp.Run(ctx,
		network.Enable(),
		network.SetExtraHTTPHeaders(headers),
		chromedp.Navigate(urlstr),
		chromedp.EvaluateAsDevTools(script, &res),
	)
	fmt.Fprintf(os.Stdout, "err: %v\ngot: %q\n", err, res)
	return err
}

Running it:

$ go run main.go 
err: <nil>
got: "a string"

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

@edoardottt I don't specifically inject a lot of headers on when scraping with chromedp, as usually I'm probably not doing anything that would "need" a full blown Chrome instance when manipulating the headers directly. However, if I had to guess, it's maybe because you're sending a non string type as the value? Please note that network.Headers is rightly a map[string]interface{} which corresponds to the generic JSON object type.

From the PDL, you can see that the runtime.Headers file in the Chromium source tree is this:

  # Network domain allows tracking network activities of the page. It exposes information about http,
  # file, data and other requests and responses, their headers, bodies, timing, etc.
  domain Network
    depends on Debugger
    depends on Runtime
    depends on Security

    # Request / response headers as keys / values of JSON object.    
    type Headers extends object    

I am inferring here that Chrome is rejecting the set header request, because of badly formatted data. I would try changing the values you're sending as strings. This likely is causing a silent error that you're not catching in your script.

Apologies if this is the case, but chromedp/cdp more or less respects the defined protocol. I'll do some testing on my end to see if this is the likely cause.

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

I'm looking through your pphack repo that you linked here, and I don't see the specific payload you're trying to inject. Could you share an actual complex script and the actual header values you're injecting?

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

Hi @kenshaw, thank you so much for your reply.

I've added a new branch (https://github.com/edoardottt/pphack/tree/add-headers) in order to show how I use the headers in chromedp.

To make a test I comment these two lines (https://github.com/edoardottt/pphack/blob/add-headers/pkg/scan/chrome.go#L50-L51, network.Enable and network.SetExtraHTTPHeaders). Then I execute:

echo https://edoardottt.github.io/pp-test/ | go run cmd/pphack/main.go -H "test:test" -v

and the output is https://edoardottt.github.io/pp-test/?constructor.prototype. ... confirming that the err is nil and the JS evaluation is performed correctly.

if instead I use those two lines (not commenting them) and I use the same command I get no std output and this error:

[ERR] encountered an undefined value

Using a proxy I can see Test: test in the HTTP headers, so I guess the headers are set correctly.

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

I'll look at this further. BTW -- if you haven't already, you should try turning on the debug logging to see the messages going back and forth, as it might be helpful:

ectx, ecancel := chromedp.NewExecAllocator(context.Background(), copts...)
pctx, pcancel := chromedp.NewContext(ectx, chromedp.WithDebugf(log.Printf))

(in your project's scan/chrome.go)

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

Are you expecting different results based on the User-Agent?

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

Regarding the debug I've tried to look at it, but I don't see anything weird tbh. If someone can understand better I can provide those logs too. But as far as I can understand, the request is sent with the proper headers.

Are you expecting different results based on the User-Agent?

No, I just want to add some extra headers

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

So -- I believe the network.Enable() call is being called, and yours is resetting the UA. From what I can tell on the output, chromedp is working as intended, as it appears from the cdp protocol messages everything is sent/received correctly.

Specifically the error you are getting is because the JS value window.xxxx is not present. That value can't be unmarshaled to a string, asundefined doesn't have a corresponding value in Go that it could be unmarshaled to. The error here should be more of a "invalid destination type" or some such. Note that you could capture the actual raw value and then evaluate after the fact if it is a string or something else.

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

So -- I believe the network.Enable() call is being called, and yours is resetting the UA.

So I have to use something like SetUserAgentOverride for this, but that's not the point here...

window.xxxx is not present

How? Why setting an extra HTTP header like Test: test should change the JS evaluation of a static website? I'm still not understanding

I've checked and this behavior is present in other similar tools, e.g. https://github.com/kosmosec/proto-find/

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

I have no idea why it's not present. You can play around with this code:

func Scan(ctx context.Context, headers map[string]interface{}, js, targetURL string) (string, error) {
	var res *runtime.RemoteObject
	err := chromedp.Run(ctx, chromedp.Tasks{
		network.SetExtraHTTPHeaders(network.Headers(headers)),
		chromedp.Navigate(targetURL),
		chromedp.EvaluateAsDevTools(js, &res),
	})

	var s string
	if res.Type == runtime.TypeString { // this is also just "string"
		s = string(res.Value)
	}
	log.Printf("s: %q -- %v", s, err)
	return s, err
}

Unfortunately, I'm not able to dig further into your code. Please update here if you find the issue.

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

Okay, thanks for your help though.

Why you removed network.Enable? Is not necessary?

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

@kenshaw Using the debug I've got something:

In the second one there's the error:

2024/02/05 09:42:53 <- {"method":"Runtime.exceptionThrown","params":{"timestamp":1.707122573427766e+12,"exceptionDetails":{"exceptionId":1,"text":"Uncaught","lineNumber":244,"columnNumber":2,"scriptId":"4","url":"https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js","stackTrace":{"callFrames":[{"functionName":"","scriptId":"4","url":"https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js","lineNumber":244,"columnNumber":2}]},"exception":{"type":"object","subtype":"error","className":"ReferenceError","description":"ReferenceError: jQuery is not defined\n    at https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js:245:3","objectId":"7735756507232330443.2.1","preview":{"type":"object","subtype":"error","description":"ReferenceError: jQuery is not defined\n    at https://rawcdn.githack.com/alrusdi/jquery-plugin-query-object/9e5871fbb531c5e246aac2aaf056b237bc7cc0a6/jquery.query-object.js:245:3","overflow":false,"properties":[{"name":"stack","type":"string","value":"ReferenceError: jQuery is not defined\n    at https\u2026c2aaf056b237bc7cc0a6/jquery.query-object.js:245:3"},{"name":"message","type":"string","value":"jQuery is not defined"}]}},"executionContextId":2}},"sessionId":"5EB446D1FB128D2499FB60BC9B58875C"}

Seems like using the header changes something and jQuery is not loading properly. TBH it's hard to think it's a problem of the website, as it's static content and returns always the same content.

from chromedp.

kenshaw avatar kenshaw commented on May 28, 2024

Ok, glad you were able to figure it out!

from chromedp.

edoardottt avatar edoardottt commented on May 28, 2024

I've got that clue, but I'm not able to solve the issue @kenshaw.
As I wrote I guess it's related to chomedp, but I don't know how to fix that behavior

Hence, the issue should not be closed

from chromedp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.