WebDriver utility library for Kotlin which encapsulates common logic into simple syntax.
- Exclude WebDriver reference from code - of course we use WebDriver, no need to write it every time.
- Remove verbose syntax like
WebDriverWait(driver, ...).until(ExpectedConditions.xxx(by))
,driver.findElement(By.xxx(...))
. - Add built-in waitning for presence of element instead of writing waiting code yourself.
- Add OCR support - for elements that are not represented by text
- Add Template Matching support - for elements that cannot be found by standard
By
implementation
class ExampleScenario(driver: WebDriver) : ExtendedWebDriver(driver) {
fun execute() {
maximize()
open("https://github.com/xinaiz/web-driver-support")
"commits".className.find().trimmedText
val treeFiles = "file-wrap".className.waitUntilClickable().findAll("tr".tag)
treeFiles.forEachIndexed { index, elem -> println("$index: ${elem.text}") }
// Easily get BufferedScreenshot from element
treeFiles[4].getBufferedScreenshot()
// Easily find by element attributes
"g p".attr("data-hotkey").clickWhenClickable()
// Wait until element is clickable then click it
"New pull request".linkText.clickWhenClickable()
open("$currentUrl/master...develop")
println("blankslate".className.textWhenPresent())
}
}
Old syntax | New syntax |
---|---|
driver.findElement(By.xxx("abc")) |
"abc".xxx.find() |
driver.findElements(By.xxx("abc")) |
"abc".xxx.findAll() |
try { driver.findElement(By.xxx("abc")) } catch(ex: Throwable) { null } |
"abc".xxx.findOrNull() |
Old syntax | New syntax |
---|---|
parentElement.findElement(By.xxx("abc")) |
parentElement.find("abc".xxx) or "abc".xxx.find(parentElement) |
parentElement.findElements(By.xxx("abc")) |
parentElement.findAll("abc".xxx) or "abc".xxx.findAll(parentElement) |
try { parentElement.findElement(By.xxx("abc")) } catch(ex: Throwable) { null } |
webElement.findOrNull("abc".xxx) or "abc".xxx.findOrNull(parentElement) |
All WebDriver
methods are available via this
context. In addition, many nested method have been flattened for simplier access. For example:
Old syntax | New syntax |
---|---|
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS) |
implicitWait = 10 to TimeUnit.SECONDS |
driver.navigate().back() |
navigateBack() |
Old syntax | New syntax |
---|---|
(driver as JavascriptExecutor).executeScript("script", args) |
executeScript("script", args) |
(driver as JavascriptExecutor).executeAsyncScript("script", args) |
executeScriptAsync("script", args) |
(driver as JavascriptExecutor).executeScript("functionName(arguments[0], arguments[1])", 42, "hello") |
runFunction("functionName", 42, "hello") |
TODO: document remaining WebElement JavaScript utility functions
Note: New wait methods throw just like original WebDriverWait
does during timeout. To avoid that, it's required to use .orNull()
syntax. When timeout occurres, instead of exception, null
will be returned as waiting result.
Old syntax | New syntax |
---|---|
WebDriverWait(webDriver, 10).until(ExpectedConditions.presenceOfElementLocated(By.xxx("abc"))) |
"abc".xxx.waitUntilPresent(10) |
try { WebDriverWait(webDriver, 10).until(ExpectedConditions.presenceOfElementLocated(By.xxx("abc"))) } catch(ex: Throwable) { } |
"abc".xxx.wait().orNull().untilPresent() |
There are many utility functions that simplify common expressions. Of course complex syntaxes are still available. For example:
Full expression | Shorter expression |
---|---|
"avatar".id.findOrNull() != null |
"avatar".id.isPresent() |
"button".id.wait(15).untilClickable().click()" |
"button".id.clickWhenClickable(15) |
"button".id.wait(15).orNull().untilClickable()?.click()" |
"button".id.clickWhenClickableOrNull(15) |
If you have canvas
element on your page with inner controls, you can't normally click specific control, because they are not present in the DOM. This library has some support for that case.
To use this functionality, OpenCV library must be present. You can add it yourself to the project, or use dependency that handles that for you. For example https://github.com/openpnp/opencv.
Example below shows how to use this functionality:
Let's assume that canvas
element has id frame
. To find it, you would write:
val canvas = "frame".id.find()
canvas.getBufferedScreenshot()
The screenshot:
Let's say you need to find the guy face. You need to take screenshot, and crop it for future use. I will call it "template":
Then, you can find that element just like that:
val guyFaces = "/images/guy_face.png".template(canvas).findAll()
Now you are left with 2 elements:
At this point you can click them, or search deeper! Another template:
This time search inside first guy face instead of whole canvas:
val guySmile = "/images/guy_smile.png".template(guyFaces[0]).find()
By default, when you search using template matching method, a screenshot is taken by WebDriver each time. Taking screenshot may be slow if done often. If you don't need updated state of canvas everytime you search, you can store screenshot in utility class ScreenCache
:
val canvas = "frame".id.find()
var screenCache = canvas.cacheScreen() // screenshot taken
"/images/notification.png".template(screenCache).click() // close some notification
"/images/home_button.png".template(screenCache).click() // click some button
// canvas changed (navigated to different content), screenCache is no longer valid
// now, depending on canvas content implementation, we might need to wait until new page appears
"/images/some_icon_on_home_page.png".template(canvas).waitUntilPresent()
screenCache = canvas.cacheScreen() // create new cache
if(!"images/statistics_title.png".template(screenCache).isPresent()) {
"/images/statistics_button.png".template(screenCache).click()
}
// etc
There are many cases that canvas content is not static - animation, overlay effects, lighting changes. In that case pixel-perfect template matching will fail miserably. To overcome this, image similarity can be specified. There are currently 5 predefined thresholds:
Name | Value | Description |
---|---|---|
Similarity.EXACT |
1.0 | Pixel-perfect match |
Similarity.PRECISE |
0.9 | A bit distored image, small overlay effects |
Similarity.DEFAULT |
0.8 | Default similarity, handles common overlay effects |
Similarity.DISTORTED |
0.7 | Highly distored image, but still recognizable |
Similarity.LOW |
0.5 | Danger zone - might find something else |
Custom similarity can be also specified:
"/button.png".template(canvas, similarity = Constants.Similarity.EXACT.value).find()
"/button.png".template(canvas, similarity = Constants.Similarity.PRECISE.value).find()
"/button.png".template(canvas).find()
"/button.png".template(canvas, similarity = Constants.Similarity.DISTORTED.value).find()
"/button.png".template(canvas, similarity = Constants.Similarity.LOW.value).find()
"/button.png".template(canvas, similarity = 0.95).find()
"/button.png".template(canvas, similarity = 0.40).find()
Important If you use similarity lower than Similarity.LOW
, you might find fish instead of elephant.
There are multiple occastions when text cannot be accessed, because it's rendered inside canvas
or is part of an image. Because of that, this library also supports recognizing text from images. Currently Tesseract
API is used by default, but there is also generic support for any API that converts BufferedImage
to String
.
As mentioned above, we use Tesseract
Api as default OCR engine. Installation instructions can be found on Tesseract github page - https://github.com/tesseract-ocr/tesseract. To use it with Web Driver Support, initialization is required:
init {
ocr.setDatapath("D:\\<tesseract-installation-folder>\\Tesseract-OCR\\tessdata")
ocr.setConfigs(listOf("quiet")) // disable logs
}
Property ocr
is defined in ExtendedWebDriver
, and it can be initalized in init
block of class that extends it.
OCR functionality is exposed by both ExtendedWebDriver
and ExtendedWebElement
classes, but latter is preferred. OCR is performed in bounds of target element:
Default OCR, no additional image processing is performed:
"body".tag.find().doOCR() // perform OCR on whole visible page
Treshold OCR. Convert image to binary (black and white). All pixels below lightness treshold 180 (scale 0-255) will be black, all above will be white.
"canvas".id.find().doBinaryOCR(treshold = 180)
Use case:
For very indistinguishable text (blending with background) or if parts of background are both brighter and darker than text, both lower and upper lightness bounds can be specified. Pixels between bounds will become white, and other will become black.
"canvas".id.find().doBinaryOCR(tresholdMin = 150, tresholdMax = 160)
Use case:
OCR is not perfect, and might mistake some characters - for example 8
and B
. For that, OCRMode
can be specified. It defines which characters are allowed. Currently there are 3 modes:
Name | Description | Allowed characters |
---|---|---|
OCRMode.TEXT |
All asci characters | All ascii characters |
OCRMode.DIGITS |
All digits | 0123456789 |
OCRMode.CUSTOM |
Custom range | For example OCRMode.CUSTOM("abcde12345") |
It can be used as follows:
"image".id.find().doOCR(ocrMode = OCRMode.DIGITS)
Other than template matching, there are other new search methods. All of them are defined in ExtendedBy
class, which extends Selenium's By
class (seriously):
Search method | Description | Example |
---|---|---|
ExtendedBy.classNameList(String) |
Classic WebDriver doesn't allow searching by multiple class names |
ExtendedBy.classNameList("unicode audiolink") |
ExtendedBy.attribute(String, String) |
Search by attribute and it's value | ExtendedBy.attribute("value", "quit") |
ExtendedBy.template(...) |
Search by image from resources (string path), or by existing BufferedImage |
ExtendedBy.template(Example::class.java, "/images/face.png") |
ExtendedBy.value(String) |
Search by value of attribute value |
ExtendedBy.value("quit") |
ExtendedBy.position(Point) |
Returns element found by position from top left corner (using javascript) | ExtendedBy.position(Point(100, 200)) |
Other than that, there are also methods that return proxy elements which are not actually real WebElement
's, but are useful in composition with Template Matching, OCR, and position related code:
Search method | Description | Example |
---|---|---|
ExtendedBy.rectangle(Rectangle) |
Returns element that is proxy of real WebElement, but is bounded by rectangle inside it. It's very useful when performing Template Matching / OCR is specific area of parent element | ExtendedBy.rectangle(Rectangle(20, 50, 100, 200)) |
ExtendedBy.point(Point) |
Similar to ExtendedBy.rectangle , but is defined only by a Point . Not suitable for Template Matching / OCR, but suitable for clicking at specific location inside other WebElement |
ExtendedBy.point(Point(200, 300)) |
ExtendedBy.percentRectangle(RectangleF) |
Similar to ExtendedBy.rectangle , but is relation to the parent element in a percentage way (all parameters - x , y , width and height |
ExtendedBy.percentRectangle(RectangleF(0.1f, 0.2f, 0.5f, 0.3f)) |
ExtendedBy.percentPoint(PointF) |
Similar to ExtendedBy.percentRectangle , but is defined only by a point. Point(0.5f, 0.5f) is center of parent element |
ExtendedBy.percentPoint(PointF(0.3f, 0.4f)) |
ExtendedBy.twoPointRectangle(TwoPointRectangle) |
Results exactly the same as ExtendedBy.rectangle , but is defined by two points - top left and bottom right |
ExtendedBy.twoPointRectangle(TwoPointRectangle(Point(100, 200), Point(200, 400))) |
ExtendedBy.twoPointPercentRectangle(TwoPointRectangleF) |
Results exactly the same as ExtendedBy.percentRectangle , but is defined by two percentage points - top left and bottom right |
ExtendedBy.twoPointRectangle(TwoPointRectangleF(PointF(0.1f, 0.2f), PointF(0.4f, 0.3f))) |