Giter Site home page Giter Site logo

joffrey-bion / chrome-devtools-kotlin Goto Github PK

View Code? Open in Web Editor NEW
43.0 4.0 7.0 5.52 MB

An asynchronous coroutine-based Kotllin client for the Chrome DevTools Protocol

License: MIT License

Kotlin 99.43% HTML 0.57%
chrome-devtools-protocol chrome async coroutines kotlin chrome-devtools-kotlin chromedp devtools protocol chrome-devtools

chrome-devtools-kotlin's Introduction

Chrome Devtools Kotlin

Maven central version Github Build GitHub license

An asynchronous coroutine-based multiplatform Kotlin client for the Chrome DevTools Protocol.

Protocol version & Code generation

Part of this client is generated based on information from the "latest" (a.k.a "tip-of-tree") JSON descriptors found in the ChromeDevTools/devtools-protocol repository. All the domains' commands and events defined in these protocol descriptors are therefore available in chrome-devtools-kotlin, as well as their doc and deprecated/experimental status.

The protocol definitions are automatically updated daily, but releases of chrome-devtools-kotlin are still manual. If you're missing some APIs or updates, don't hesitate to open an issue to request a new release with updated protocol.

You can find the protocol version used by chrome-devtools-kotlin after the - in the version number. For instance, version 1.3.0-861504 of chrome-devtools-kotlin was built using version 861504 of the Chrome DevTools Protocol (this is technically the Chromium revision, but it gives the tip-of-tree version number of the protocol).

Concepts

Domains

The Chrome Devtools Protocol defines domains that expose some commands and events. They are subsets of the protocol's API. You can find the list and documentation of all domains in the protocol's web page.

This library defines a type for each domain (e.g. PageDomain, StorageDomain...), which exposes:

  • a suspend method for each command, accepting a request type and returning a response type, respectively containing the input and output parameters defined in the protocol for that command.
  • additional suspend functions for each command with a more convenient signature.
  • a method for each type of event, returning a Flow of this particular event type
  • an events() method, returning a Flow of all events of the domain

The domains usually also expose an enable() command which is required to enable events. Without calling it, you will receive no events in the Flow subscriptions.

Targets

Clients can interact with different parts of Chrome such as pages (tabs), service workers, and extensions. These parts are called targets. The browser itself is also a target.

Each type of target supports a different subset of the domains defined in the protocol.

Sessions

The protocol requires you to attach to a target in order to interact with it. Attaching to a target opens a target session of the relevant type, such as BrowserSession or PageSession.

When connecting to Chrome, a browser target is automatically attached, thus you obtain a BrowserSession. You can then use this session to attach to other targets (child targets), such as pages.

Each of the supported domains are defined as properties of the session type, which provides a type-safe way to know if the attached target supports a given domain. For instance, PageSession.dom gives access to the DOM domain in this page session, which allows to issue commands and listen to DOM events.

Note: The supported set of domains for each target type is not clearly defined by the protocol, so I have to regularly extract this information from Chromium's source code itself and update my own extra definition file: target_types.json.

Because of this, there might be some missing domains on some session types at some point in time that require manual adjustment. If this is the case, use the PageSession.unsafe() method on the session object to get full access to all domains (also, please open an issue so I can fix the missing domain).

Quick Start

Add the dependency

This library is available on Maven Central.

It requires a Ktor engine to work, so make sure to add one that supports the web sockets feature (check the compatibility table). For example, you can pick the CIO engine by adding this engine to your dependencies next to chrome-devtools-kotlin:

dependencies {
    implementation("org.hildan.chrome:chrome-devtools-kotlin:$version")
    implementation("io.ktor:ktor-client-cio:$ktorVersion")
}

Get a browser running

This library doesn't provide a way to start a browser programmatically. It assumes a browser was started externally and exposes a debugger server.

For instance, you can start a Headless Chrome browser with the following docker command, which exposes the debugger server at http://localhost:9222:

docker container run -d -p 9222:9222 zenika/alpine-chrome --no-sandbox --remote-debugging-address=0.0.0.0 --remote-debugging-port=9222 about:blank

Connecting to the browser

The starting point of this library is the ChromeDPClient, which is created using the "remote-debugging" URL that was passed to Chrome. You can then open a webSocket() to the browser debugger, which automatically attaches to the browser target and starts a "browser session":

val client = ChromeDPClient("http://localhost:9222")
val browserSession: BrowserSession = client.webSocket()

When you're done with the session, don't forget to close() it in order to close the underlying web socket. You can also use the familiar use { ... } extension, which automatically closes the session in every code path leaving the lambda (even in case of exception).

Connecting to targets

The BrowserSession is attached to a browser target, but we're usually interested in more useful targets, such as pages. Once you have your browser session, you can use it to create child page targets and attach to them. Here is an example to create a new page target and attach to it:

ChromeDPClient("http://localhost:9222").webSocket().use { browserSession ->
    browserSession.newPage().use { pageSession ->
        // goto() navigates the current page to the URL and awaits the 'load' event by default
        pageSession.goto("http://example.com")

        // This page session has access to many useful protocol domains (e.g. dom, page...)
        val doc = pageSession.dom.getDocument().root
        val base64Img = pageSession.page.captureScreenshot {
            format = ScreenshotFormat.jpeg
            quality = 80
        }
    } // automatically closes the pageSession
} // automatically closes the browserSession

High level extensions

In addition to the generated domain commands and events, some extensions are provided for higher-level functionality. You can discover them using auto-complete in your IDE.

The most useful of them are documented in this section.

PageSession extensions

  • PageSession.goto(url: String): navigates to a URL and also waits for the next load event, or other events
  • PageSession.clickOnElement(selector: CssSelector, clickDuration: Duration, mouseButton: MouseButton): finds a node using a selector query and simulates a click on it

DOM Domain extensions

  • DOMDomain.getNodeBySelector(selector: CssSelector): NodeId?: finds a node using a selector query, or throws if not found
  • DOMDomain.findNodeBySelector(selector: CssSelector): NodeId?: finds a node using a selector query, or returns null
  • DOMDomain.awaitNodeBySelector(selector: CssSelector, pollingPeriod: Duration): NodeId?: suspends until a node corresponding to the selector query appears in the DOM.
  • DOMDomain.awaitNodeAbsentBySelector(selector: CssSelector, pollingPeriod: Duration): NodeId?: suspends until a node corresponding to the selector query disappears from the DOM.
  • DOMDomain.getTypedAttributes(nodeSelector: CssSelector): DOMAttributes?: gets the attributes of the node corresponding to the given nodeSelector, or null if the selector didn't match any node.
  • DOMDomain.getAttributeValue(nodeSelector: CssSelector, attributeName: String): String?: gets the value of the attribute attributeName of the node corresponding to the given nodeSelector, or null if the node wasn't found.
  • DOMDomain.setAttributeValue(nodeSelector: CssSelector, name: String, value: String): sets the value of the attribute attributeName of the node corresponding to the given nodeSelector.

Page Domain extensions

  • CaptureScreenshotResponse.decodeData(): ByteArray: decodes a base64 encoded image (from PageDomain.captureScreenshot) into bytes
  • PageDomain.captureScreenshotToFile(outputFile: Path, options) (JVM-only)

Runtime Domain extensions

  • <T> RuntimeDomain.evaluateJs(js: String): T?: evaluates JS and returns the result (uses Kotlinx serialization to deserialize JS results)

Examples

Example usage of Runtime.evaluateJs(js: String):

@Serializable
data class Person(val firstName: String, val lastName: String)

val evaluatedInt = pageSession.runtime.evaluateJs<Int>("42")
assertEquals(42, evaluatedInt)

val evaluatedPerson = pageSession.runtime.evaluateJs<Person>("""eval({firstName: "Bob", lastName: "Lee Swagger"})""")
assertEquals(Person("Bob", "Lee Swagger"), evaluatedPerson)

Note that the deserialization here uses Kotlinx Serialization, which requires annotating the deserialized classes with @Serializable and using the corresponding compiler plugin.

Troubleshooting

Host header is specified and is not an IP address or localhost

Sometimes this error also appears in the form of an HTTP 500.

Chrome doesn't accept a Host header that is not an IP nor localhost, but in some environments it might be hard to provide this (e.g. docker services in a docker swarm, communicating using service names).

To work around this problem, simply set overrideHostHeader to true when creating ChromeDPClient. This overrides the Host header to "localhost" in the HTTP requests to the Chrome debugger to make it happy, and also replaces the host in subsequent web socket URLs (returned by Chrome) by the initial host provided in remoteDebugUrl. This is necessary because Chrome uses the Host header to build these URLs, and it would be incorrect to keep this.

License

This project is distributed under the MIT license.

Alternatives

If you're looking for Chrome Devtools Protocol clients in Kotlin, I have only found one other so far, chrome-reactive-kotlin. This is the reactive equivalent of this project. The main differences are the following:

  • it only supports the JVM platform
  • it uses a reactive API (as opposed to coroutines and suspend functions)
  • it doesn't distinguish target types, and thus doesn't restrict the available domains at compile time (it's the equivalent of always using unsafe() in chrome-devtools-kotlin)

I needed a coroutine-based API instead and more type-safety, which is how this chrome-devtools-kotlin project was born.

You can find alternative Chrome DevTools libraries in other languages in this awesome list.

Credits

Special thanks to @wendigo and his project chrome-reactive-kotlin which inspired the creation of this project.

chrome-devtools-kotlin's People

Contributors

archangelx360 avatar dependabot[bot] avatar github-actions[bot] avatar gradle-update-robot avatar joffrey-bion avatar jschneider avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

chrome-devtools-kotlin's Issues

Extensions and typing for HTML attributes

The API for HTML attributes in the protocol is pretty bad.
It returns the attributes as a list of strings in the form [key1, value1, key2, value2].

We can do better by providing a Map<String, String> and maybe even a proper type with typed properties for common HTML attributes.

Use JSONObject for type "object" without properties

In the protocol definitions, there are occurrences of both type: "any" and type: "object" without any properties list.

My interpretation is that any can be anything including primitives (effectively any JSON value) but object is limited to only structures with properties (effectively only JSON objects).

In that case, an object without properties list would represent a type whose properties are dynamic (e.g. JSONObject).

Add getTargetInfo convenience method

It's important to be able to retrieve up-to-date information about the current page target.

The way to do so is currently the following:

pageSession.target.getTargetInfo(GetTargetInfoRequest()).targetInfo

It could be nice to have a convenience method for this.

Also, the targetInfo property of the ChromePageSession should be revisited to remove fields that are susceptible to change over the course of the session.

Mark awaitNodeBySelector() as @ExperimentalTime

Kotlin 1.5 brings a breaking change with the Duration class, and this affects the generation of mangled names for all methods using Duration as argument (for instance, the coroutines' delay(Duration) method).

Since awaitNodeBySelector uses Duration as argument, it is not correct to just @OptIn and we should instead mark the API as experimental too.

We should also provide a stable alternative so that people don't have trouble migrating (see #82)

ChromeDPClient instances leak open Ktor Apache HTTP client

There is currently no way to close the inner HTTP client inside a ChromeDPClient.
Also, a new HTTP client is created for each new instance of ChromeDPClient.

Both of these facts lead to thread leaks when creating/destroying a lot of ChromeDPClient isntances.

Suppress warnings for generated classes

Generated classes from the protocol use Chrome's experimental and deprecated APIs, generating a bunch of warnings during the build.

It would be nice to suppress these warnings somehow, e.g. suppress deprecations and maybe opt-in for experimental.

A more complicated way (but more accurate) would be to add the experimental/deprecation annotation to the calling property/method/class if possible, which would both suppress the warning and transfer the responsibility to the user of this piece of code.

Add more domains to RenderFrame (page) target

A bunch of domains seem to be available in zenika/alpine-chrome server, despite not being explicitly visible in the chromium code.
We should add these domains so that they are accessible in a type-safe manner without unsafe() call.

Use actual enum types for strings that have a finite set of possible values

Some inputs contain string properties that are in fact enums with a specific set of values. For instance, DispatchMouseEventRequest.type is technically sent as a string on the wire, but is defined as an enum with specific possible values in the protocol.

It would be nice if this was exposed to the user as an enum instead of a string. We would need to make sure that the enum constants are properly defined so that Kotlinx Serialization sends the proper string on the wire.

Add helper to access "child pages"

Sometimes, actions on a page open new tabs.
It is useful to attach to these new tabs to continue the navigation and maybe go back to the initial page.

Therefore a helper to access a list of targets of type "page" opened by the current page would be nice.

Document all helper extensions

The basic Kotlin API for the CDP don't really need documentation because they are defined by the protocol itself.

However, all extensions are invisible to users (apart from IDE autocompletion) unless properly documented.

Provide stable alternative of awaitNodeBySelector() with long millis

Kotlin 1.5 brings a breaking change with the Duration class, and this affects coroutines' delay(Duration) method.
We should provide a stable alternative so that people don't have trouble migrating.

Also, the awaitNodeBySelector(String, Duration) should be marked @ExperimentalTime (see #81)

Web socket connection is not thread safe

It seems that interacting from several page sessions that were spawned by the same browser session (thus sharing the same web socket) is not thread safe.

We can run into this exception:

Send pending
java.lang.IllegalStateException: Send pending
	at java.net.http/jdk.internal.net.http.websocket.WebSocketImpl.sendText(WebSocketImpl.java:182)
	at org.hildan.krossbow.websocket.jdk.Jdk11WebSocketSession.sendText(Jdk11WebSocketClient.kt:105)
	at org.hildan.chrome.devtools.protocol.ChromeDPConnection.request(ChromeDPConnection.kt:27)
	at org.hildan.chrome.devtools.protocol.ChromeDPSession.request(ChromeDPSession.kt:39)
	at org.hildan.chrome.devtools.domains.target.TargetDomain.createBrowserContext(TargetDomain.kt:586)
	at org.hildan.chrome.devtools.targets.TargetExtensionsKt.attachToNewPage(TargetExtensions.kt:47)
	at org.hildan.chrome.devtools.targets.TargetExtensionsKt.attachToNewPage$default(TargetExtensions.kt:44)
	at IntegrationTests$test_parallelPages$1$invokeSuspend$$inlined$use$lambda$1.invokeSuspend(IntegrationTests.kt:103)

This is consistently reproduced by this test case:
https://github.com/joffrey-bion/chrome-devtools-kotlin/blob/main/src/test/kotlin/IntegrationTests.kt#L95

use() and close() on page sessions should NOT close the web socket

Page sessions are derived from a browser session, and many page sessions could be used in parallel, multiplexed on the same connection.
The most common usage of use() or close() would be to close the target (the page) but not the underlying web socket connection.
We should therefore provide a different API for the page sessions which doesn't close the connection.
Also, the behaviour of use() should be changed to use that API instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.