hrbrmstr / splashr Goto Github PK

:sweat_drops: Tools to Work with the 'Splash' JavaScript Rendering Service in R

License: Other

R 100.00%

r rstats web-scraping splash selenium phantomjs har r-cyber

splashr's Introduction

`splashr` : Tools to Work with the ‘Splash’ JavaScript Rendering Service

TL;DR: This package works with Splash rendering servers which are really just a REST API & lua scripting interface to a QT browser. It’s an alternative to the Selenium ecosystem which was really engineered for application testing & validation.

Sometimes, all you need is a page scrape after javascript has been allowed to roam wild and free over meticulously crafted HTML tags. So, this package does not do everything Selenium can in pure R (though, the Lua interface is equally as powerful and accessible via R), but if you’re just trying to get a page back that needs javascript rendering, this is a nice, lightweight, consistent alternative.

It’s also an alternative to the somewhat abandoned phantomjs (which you can use in R within or without a Selenium context as it’s it’s own webdriver) and it may be useful to compare renderings between this package & phantomjs.

The package uses the stevedore package to orchestrate Docker on your system (if you have Docker and more on how to use the stevedore integration below) but you can also do get it running in Docker on the command-line with two commands:

sudo docker pull scrapinghub/splash:latest --disable-browser-caches
sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash:latest --disable-browser-caches

Do whatever you Windows ppl do with Docker on your systems to make ^^ work.

Folks super-new to Docker on Unix-ish platforms should make sure to do:

sudo groupadd docker
sudo usermod -aG docker $USER

($USER is your username and shld be defined for you in the environment)

If using the stevedore package you can use the convience wrappers in this pacakge:

install_splash()
splash_container <- start_splash()

and then run:

stop_splash(splash_container)

when done. All of that happens on your localhost and you will not need to specify splash_obj to many of the splashr functions if you’re running Splash in this default configuration as long as you use named parameters. You can also use the pre-defined splash_local object if you want to use positional parameters.

Now, you can run Selenium in Docker, so this is not unique to Splash. But, a Docker context makes it so that you don’t have to run or maintain icky Python stuff directly on your system. Leave it in the abandoned warehouse district where it belongs.

All you need for this package to work is a running Splash instance. You provide the host/port for it and it’s scrape-tastic fun from there!

About Splash

‘Splash’ https://github.com/scrapinghub/splash is a javascript rendering service. It’s a lightweight web browser with an ‘HTTP’ API, implemented in Python using ‘Twisted’and ’QT’ [and provides some of the core functionality of the ‘RSelenium’ or ‘seleniumPipes’ R packages but with a Java-free footprint]. The (twisted) ‘QT’ reactor is used to make the sever fully asynchronous allowing to take advantage of ‘webkit’ concurrency via QT main loop. Some of Splash features include the ability to process multiple webpages in parallel; retrieving HTML results and/or take screenshots; disabling images or use Adblock Plus rules to make rendering faster; executing custom JavaScript in page context; getting detailed rendering info in HAR format.

The following functions are implemented:

render_html: Return the HTML of the javascript-rendered page.
render_har: Return information about Splash interaction with a website in HAR format.
render_jpeg: Return a image (in JPEG format) of the javascript-rendered page.
render_png: Return a image (in PNG format) of the javascript-rendered page.
execute_lua: Execute a custom rendering script and return a result.
splash: Configure parameters for connecting to a Splash server
install_splash: Retrieve the Docker image for Splash
start_splash: Start a Splash server Docker container
stop_splash: Stop a running a Splash server Docker container

Mini-DSL (domain-specific language). These can be used to create a “script” without actually scripting in Lua. They are a less-powerful/configurable set of calls than what you can make with a full Lua function but the idea is to have it take care of very common but simple use-cases, like waiting a period of time before capturing a HAR/HTML/PNG image of a site:

splash_plugins: Enable or disable browser plugins (e.g. Flash).
splash_click: Trigger mouse click event in web page.
splash_focus: Focus on a document element provided by a CSS selector
splash_images: Enable/disable images
splash_response_body: Enable or disable response content tracking.
splash_go: Go to an URL.
splash_wait: Wait for a period time
splash_har: Return information about Splash interaction with a website in HAR format.
splash_html: Return a HTML snapshot of a current page.
splash_png: Return a screenshot of a current page in PNG format.
splash_press: Trigger mouse press event in web page.
splash_release: Trigger mouse release event in web page.
splash_send_keys: Send keyboard events to page context.
splash_send_text: Send text as input to page context, literally, character by character.
splash_user_agent: Overwrite the User-Agent header for all further requests. NOTE: There are many “helper” user agent strings to go with splash_user_agent. Look for objects in splashr starting with ua_.

httr helpers. These help turn various bits of splashr objects into httr-ish things:

as_req: Turn a HAR response entry into a working httr function you can use to make a request with
as_request: Turn a HAR response entry into an httr response-like object (i.e. you can use httr::content() on it)

Helpers:

tidy_har: Turn a gnHARly HAR object into a tidy data frame
get_body_size: Retrieve size of content | body | headers
get_content_sie: Retrieve size of content | body | headers
get_content_type Retrieve or test content type of a HAR request object
get_headers_size Retrieve size of content | body | headers
is_binary: Retrieve or test content type of a HAR request object
is_content_type: Retrieve or test content type of a HAR request object
is_css: Retrieve or test content type of a HAR request object
is_gif: Retrieve or test content type of a HAR request object
is_html: Retrieve or test content type of a HAR request object
is_javascript: Retrieve or test content type of a HAR request object
is_jpeg: Retrieve or test content type of a HAR request object
is_json: Retrieve or test content type of a HAR request object
is_plain: Retrieve or test content type of a HAR request object
is_png: Retrieve or test content type of a HAR request object
is_svg: Retrieve or test content type of a HAR request object
is_xhr: Retrieve or test content type of a HAR request object
is_xml: Retrieve or test content type of a HAR request object

Some functions from HARtools are imported/exported and %>% is imported/exported.

TODO

Suggest more in a feature req!

~~Implement render.json~~
~~Implement “file rendering”~~
~~Implement execute (you can script Splash!)~~
~~Add integration with HARtools~~
Possibly writing R function wrappers to install/start/stop Splash which would also support enabling javascript profiles, request filters and proxy profiles from with R directly, using harbor
Re-implement render_file()
Testing results with all combinations of parameters

Installation

# CRAN
install.packages("splashr")

# DEV
# See DESCRIPTION for non-CINC-provided dependencies 
install.packages("splashr", repos = c("https://cinc.rud.is/"))

Usage

NOTE: ALL of these examples assume Splash is running in the default configuraiton on localhost (i.e. started with start_splash() or the docker example commands) unless otherwise noted.

library(splashr)
library(magick)
library(rvest)
library(anytime)
library(tidyverse)

# current verison
packageVersion("splashr")

## [1] '0.7.0'

splash_active()

## [1] TRUE

splash_debug()

## List of 7
##  $ active  : list()
##  $ argcache: int 0
##  $ fds     : int 59
##  $ leaks   :List of 5
##   ..$ Deferred   : int 20
##   ..$ DelayedCall: int 3
##   ..$ LuaRuntime : int 1
##   ..$ QTimer     : int 1
##   ..$ Request    : int 1
##  $ maxrss  : int 219096
##  $ qsize   : int 0
##  $ url     : chr "http://localhost:8050"
##  - attr(*, "class")= chr [1:2] "splash_debug" "list"
## NULL

Notice the difference between a rendered HTML scrape and a non-rendered one:

render_html(url = "http://marvel.com/universe/Captain_America_(Steve_Rogers)")

## {html_document}
## <html lang="en">
## [1] <head class="at-element-marker">\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta char ...
## [2] <body class="terrigen">\n<div id="__next"><div id="terrigen-page" class="page"><div id="page-wrapper" class="page ...

xml2::read_html("http://marvel.com/universe/Captain_America_(Steve_Rogers)")

## {html_document}
## <html lang="en">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta charset="utf-8" class="next-he ...
## [2] <body class="terrigen">\n<div id="__next"><div id="terrigen-page" class="page"><div id="page-wrapper" class="page ...

You can also profile pages:

render_har(url = "http://www.poynter.org/") -> har

print(har)

## --------HAR VERSION-------- 
## HAR specification version: 1.2 
## --------HAR CREATOR-------- 
## Created by: Splash 
## version: 3.4.1 
## --------HAR BROWSER-------- 
## Browser: QWebKit 
## version: 602.1 
## --------HAR PAGES-------- 
## Page id: 1 , Page title: Poynter - Poynter 
## --------HAR ENTRIES-------- 
## Number of entries: 270 
## REQUESTS: 
## Page: 1 
## Number of entries: 270 
##   -  http://www.poynter.org/ 
##   -  https://www.poynter.org/ 
##   -  https://www.googletagservices.com/tag/js/gpt.js 
##   -  https://www.poynter.org/wp-includes/js/wp-emoji-release.min.js?ver=5.3.2 
##   -  https://www.poynter.org/wp-includes/css/dist/block-library/style.min.css?ver=5.3.2 
##      ........ 
##   -  https://www.poynter.org/wp-content/themes/elumine-child/assets/fonts/PoynterOSDisplay/PoynterOSDisplay-BoldItalic... 
##   -  https://securepubads.g.doubleclick.net/gampad/ads?gdfp_req=1&pvsid=3405643238502098&correlator=142496486392385&ou... 
##   -  https://securepubads.g.doubleclick.net/gpt/pubads_impl_rendering_2020021101.js 
##   -  https://tpc.googlesyndication.com/safeframe/1-0-37/html/container.html 
##   -  https://static.chartbeat.com/js/chartbeat.js

tidy_har(har)

## # A tibble: 270 x 20
##    status started total_time page_ref timings req_url req_method req_http_version req_hdr_size req_headers req_cookies
##     <int> <chr>        <int> <chr>    <I<lis> <chr>   <chr>      <chr>                   <int> <I<list>>   <I<list>>  
##  1    301 2020-0…        171 1        <tibbl… http:/… GET        HTTP/1.1                  189 <tibble [2… <tibble [0…
##  2    200 2020-0…        238 1        <tibbl… https:… GET        HTTP/1.1                  189 <tibble [2… <tibble [0…
##  3    200 2020-0…        135 1        <tibbl… https:… GET        HTTP/1.1                  164 <tibble [3… <tibble [0…
##  4    200 2020-0…         93 1        <tibbl… https:… GET        HTTP/1.1                  164 <tibble [3… <tibble [0…
##  5    200 2020-0…        158 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
##  6    200 2020-0…        218 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
##  7    200 2020-0…        262 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
##  8    200 2020-0…        265 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
##  9    200 2020-0…        266 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
## 10    200 2020-0…        269 1        <tibbl… https:… GET        HTTP/1.1                  179 <tibble [3… <tibble [0…
## # … with 260 more rows, and 9 more variables: resp_url <chr>, resp_rdrurl <chr>, resp_type <chr>, resp_size <int>,
## #   resp_cookies <I<list>>, resp_headers <I<list>>, resp_encoding <chr>, resp_content_size <dbl>, resp_content <chr>

You can use HARtools::HARviewer — which this pkg import/exports — to get view the HAR in an interactive HTML widget.

Full web page snapshots are easy-peasy too:

render_png(url = "http://www.marveldirectory.com/individuals/c/captainamerica.htm")

render_jpeg(url = "http://static2.comicvine.com/uploads/scale_small/3/31666/5052983-capasr2016001-eptingvar-18bdb.jpg")

Executing custom Lua scripts

lua_ex <- '
function main(splash)
  splash:go("http://rud.is/b")
  splash:wait(0.5)
  local title = splash:evaljs("document.title")
  return {title=title}
end
'

splash_local %>% execute_lua(lua_ex) -> res

rawToChar(res) %>% 
  jsonlite::fromJSON()

## $title
## [1] "rud.is | \"In God we trust. All others must bring data\""

Interacting With Flash sites

splash_local %>%
  splash_plugins(TRUE) %>%
  splash_go("https://gis.cdc.gov/GRASP/Fluview/FluHospRates.html") %>%
  splash_wait(4) %>%
  splash_click(460, 550) %>%
  splash_wait(2) %>%
  splash_click(230, 85) %>%
  splash_wait(2) %>%
  splash_png()

stop_splash(splash_vm)

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

splashr's People

Contributors

Stargazers

Watchers

Forkers

mruessler nikolayvoronchikhin firebitsbr rmasiniexpert bedantaguru mstei4176

splashr's Issues

curl Recv failure: Connection reset by peer

I used this library without problems a few month ago. However, I installed it on two new computers (ubuntu) today and got the same error.

I am working with a minimal example:

splash_container <- start_splash()
render_html(url = "https://www.qwant.com/")
stop_splash(splash_container)

It works well when I run the commands one by one. But when I try to run the three commands in one go, I get the error below after render_html:

Error in curl::curl_fetch_memory(url, handle = handle) :
  Recv failure: Connection reset by peer

I am not sure what the error is, but I managed to get around it by adding Sys.sleep(2) after start_splash(). Is it possible that render_html is called too fast? Before the container created by start_splash is fully loaded and operational?

Thanks a lot for making this package!

Web scraping with splashr fails with curl error after many successes

I am scraping a few dozen URLs using splashr.
The code runs and completes fine when run directly from RStudio Server on my Digital Ocean Droplet. However, when it runs from a cron job it always fails when reading the 24th URL with this error:

Error in curl::curl_fetch_memory(url, handle = handle) : Recv failure: Connection reset by peer

Even when it works when running the code directly from RStudio, I see this error the first 14 scrapes:

QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.

But it completes OK.

Is there some memory management or garbage collection that I'm supposed to be doing between scrapes? What would account for the success of a direct run and the failure of the same script being run by a cron job? In short, how do I avoid the curl error mentioned above?

library("tidyverse")
library("splashr")
library("rvest")

# Launch SplashR
# system2("docker", args = c("pull scrapinghub/splash:latest"))
# system2("docker", args = c("run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash:latest"), wait = FALSE)
# splash_active()

pause_after_html_read <- 5
pause_after_html_text <- 3

for(idx in 1:28){  
  
  splash(host = "localhost", port = 8050L) |> 
    splash_response_body(FALSE) %>%
    splash_go(url = url_df$web_page[idx]) %>%
    splash_wait(pause_after_html_read) %>%
    splash_html() |> 
    html_text() -> pg
  
    Sys.sleep(pause_after_html_text)
}

Difficulty with installing and starting/stopping splashr within R?

So I've been having a bit of trouble with setting updocker and splashr on my Windows 10 Home computer. I think I installed Docker correctly, but for some reason, I can only use splashr when I run

docker pull scrapinghub/splash:3.0
docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash:3.0

in the Docker Quickstart Terminal.

This is okay, but I can't use start_splash() or stop_splash(), as I can't get install_splash() to work. Here's my error:

Any idea what I should do/if this is even an issue?

unable to get splashr to render content

Hey Bob,

Thank you for making this (and all your other) pkg - i'm a big fan and use it regularly to scrape stuff. So much easier and faster than the selenium route. This might be the wrong channel to ask this so please feel free to ignore/close this issue.

Here's my problem:
Recently Spencer Graves asked this question on the R-help mailing list - how to scrape this site
https://www.battleforthenet.com/scoreboard/

At first i thought ahh it would be a breeze with splashr so i tried

library("splashr")
url <- "https://www.battleforthenet.com/scoreboard"

page <- splashr::render_html(url = url, wait = 10)

res <- 
  page %>% 
  rvest::html_nodes("#senate") %>% 
  rvest::html_nodes(".politicians")

# returns
> {xml_nodeset (0)}

but this doesn't seem to work. Trying to see what's going on

res <- 
  page %>% 
  rvest::html_nodes("#senate") %>% 
  xml2::html_structure()

# returns
> [[1]]
<div#senate .politicians>
  {text}
  <h2>
    {text}
    <em>
      {text}
    {text}
  {text}
  <team-legend>
  <p>
    {text}
  {text}
  <politician-card [v-for, :politician, v-if]>

I'm not very knowledgable of web-dev things so i apologize if this might be something obvious. I'm trying to understand what's going on and why does this not work with splashr. My best guess would be that there is some kind of secret js mumbo jumbo going on which manages to keep away the real content from splashr...

I would be thankful if you could just point me in some direction on how to do this.
Thanks again for all your work!
david

Error on Windows

Originally posted here https://stackoverflow.com/questions/47739932/r-splashr-error-on-windows

Hi, I'm trying to get Splashr working following this tutorial. https://rud.is/b/2017/02/09/diving-into-dynamic-website-content-with-splashr/

Ive installed Docker for windows successfully, Docker SDK for Python and (hopefully) the dependent Python packages. I've set the path for Python in System Variables and tried this R code with both Python 2.7 and 3.6 but get the same error:

library(splashr)
install_splash()

Error in py_call_impl(callable, dots$args, dots$keywords) :
DockerException: Credentials store error: StoreError('Unexpected OS error "The handle is invalid", errno=9',)

Detailed traceback:
File "C:\Python36\lib\site-packages\docker\api\image.py", line 381, in pull
header = auth.get_config_header(self, registry)
File "C:\Python36\lib\site-packages\docker\auth.py", line 50, in get_config_header
authcfg = resolve_authconfig(client._auth_configs, registry)
File "C:\Python36\lib\site-packages\docker\auth.py", line 97, in resolve_authconfig
authconfig, registry, store_name
File "C:\Python36\lib\site-packages\docker\auth.py", line 142, in _resolve_authconfig_credstore
'Credentials store error: {0}'.format(repr(e))

I'm using Windows 10 Pro Version 1703

R version 3.4.3

R Studio Version 1.1.383

Thanks in advance

Scrolling

I'm trying to scroll down a page (it only adds more rows as you do so) by using page down: splash_send_keys( "").
However, this doesn't seem to work. I tried using execute_lua to call splash:jsfunc("window.scrollTo"), but couldn't seem to get this to work. Would it be possible to incorporate scrolling up and down?
Great package otherwise!

Error in render_html - Bad Gateway (HTTP 502)

Iḿ trying to access a link with the command:

splash("localhost") %>% render_html("http://www.uol.com.br")

but it returns that output error:

Error in render_html(., "http://www.uol.com.br") : Bad Gateway (HTTP 502).

What should I do? It was working fine until today.

splash_select function

Is it possible to select an element and click on it as described in mouse_click?

local button = splash:select('button')
button:mouse_click()

Since you already have splash_click and splash_focus, it would be great to have an equivalent splash_select R function.

Thank you for a great package.

Can't execute install_splash()

Hi,
Following the instructions reported on this blog post, I got stuck when executing install_splash().
The provided error is the following one:
Error in sys::exec_internal(docker, args = args, error = TRUE) : Executing '/usr/bin/docker' failed with status 1
despite the fact that I can definitely execute docker by typing /usr/bin/docker or even only docker in the terminal. My OS is Ubuntu 16.04.2 LTS.

Capture console output and uncaught JS errors

This is likely a splash question as opposed to splashr, but was curious if you know whether it's possible to capture error messages and console.logs. There seems to be an option for console = TRUE in render_json but not sure if that does anything.

installation error

✓ [master]> library(splashr)
ℹ splashr not installed. Trying to install now
ℹ Installing splashr
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Error: package or namespace load failed for 'splashr' in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/lib/R/library/magick/libs/magick.so':
  /usr/lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block

by refer to Rdatatable/data.table#5030 (comment), downloaded GitHub project and Install & Restart but doesn't help.

by refer to ReadME.md via richfitz/stevedore#59, package installed but doesn't help.

is this project going to be high priority?

Hi harbourmaster,

I noticed that phantomJs is pretty much dead. Is this the opportunity to boost splashr? What are your plans?

Thanks

Question about finding coords for splash_click

I'm trying to use splashr's splash_click() function. As I work through the example at the bottom of the splashr tutorial, it doesn't seem to work for me.

The png that returns to me is covered up by the pop-up that initially appears when visiting the site. The documentation specifies that coords for the click function need to be "relative to viewport". I'm not sure what that means, but I'm guessing mine aren't.

How can I be sure I am using the right coords?

splash_local %>% splash_plugins(TRUE) %>% splash_go("https://gis.cdc.gov/GRASP/Fluview/FluHospRates.html") %>% splash_wait(4) %>% splash_click(460, 550) %>% splash_wait(2) %>% splash_click(230, 85) %>% splash_wait(2) %>% splash_png()

Difficulty rendering local html file

Hi, thank you for another handy package.

Is it possible to render local html files? I get a 502 error.

x <- render_html(url = "./file.html")
# Error in render_html(url = "./file.html") : 
#   Bad Gateway (HTTP 502).

Can't pull multiple pages

I'm trying to update a scraper I had that scrapes a site that switched to JavaScript. I've successfully used splashr to scrape a single page. The problem, most likely user error, I'm having is in scraping multiple "pages" at the site. (games.crossfit.com)

MWE

library(tidyverse)
library(lubridate)
library(stringr)
library(rvest)
library(splashr)

# set up the splasher docker container
install_splash()  
splash_svr <- start_splash()
pond <- splash("localhost")

# test to see if the server is active:
pond %>% splash_active()

base_url <- "https://games.crossfit.com/leaderboard?competition=1&year=2017&division=1&scaled=0&sort=0&fittest=1&fittest1=0&occupation=0&page="

url1 <- paste0(base_url, 1)
url2 <- paste0(base_url, 2)

page1 <- render_html(pond, url1) 
page2 <- render_html(pond, url2) 

# get the athletes names as test
# sometimes both work, sometimes one and sometimes, neither work
head(html_text(html_nodes(page1, css = "td .full-name")))
head(html_text(html_nodes(page2, css = "td .full-name")))

I'm new to web scraping so more than likely I'm simply going about this the wrong way.

Using http_post

Hi Bob,

Thanks for yet another fantastic package!

I'm wondering whether I can use splashr on a site where I have to login with a username and password and then render the javascript on the site. I quickly browsed the splash documentation and this might be possible with splash::http_post.

My question is: is this possible using your package? If not, will you consider adding it?

Thanks again!

execute_lua() save_args/load_args arguments unclear

For the command execute_lua(), the documentation says that you can load a list of arguments to the cache with save_args, but I can't seem to get the load_args parameter to work correctly. Is it also supposed to be in a list format with the position of the arguments corresponding to the save_args argument?

Would it be possible for you to add a more clear example to the documentation? A great example would be passing a url through to the script so that it can be reusable.

Thank you for your help with this and your work on this project, it has made my life much easier overall.

Error in py_get_attr_impl(x, name, silent) : AttributeError: 'Client' object has no attribute 'api'

the first error will vanish after 'pip install docker-py' or 'conda install docker-py' but then comes another one, which becomes a blocker:

This is on Windows 10 Pro. Any ideas how to proceed?

can't install splash from R command install_splash()

can't install splash from R command install_splash(), even i followed the solution in #3

library(splashr)
install_splash()
Error in sys::exec_internal(docker, args = args, error = TRUE) :
Executing '/usr/bin/docker' failed with status 1

my OS is CentOS 7, here is the error message in the log file

Mar 10 11:41:09 localhost systemd: Started Docker Application Container Engine.
Mar 10 11:41:36 localhost chronyd[610]: Source 61.216.153.104 replaced with 202.112.29.82
Mar 10 11:41:54 localhost dockerd-current: time="2017-03-10T11:41:54.777093856+08:00" level=info msg="{Action=create, Username=gcshen, LoginUID=1000, PID=2624}"
Mar 10 11:41:54 localhost dockerd-current: time="2017-03-10T11:41:54.997717009+08:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: write tcp 192.168.1.101:49126->50.17.62.194:443: write: connection reset by peer"
Mar 10 11:41:54 localhost dockerd-current: time="2017-03-10T11:41:54.997741287+08:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/: write tcp 192.168.1.101:49126->50.17.62.194:443: write: connection reset by peer"
Mar 10 11:41:57 localhost dockerd-current: time="2017-03-10T11:41:57.169904010+08:00" level=error msg="Not continuing with pull after error: Tag latest not found in repository docker.io/hrbrmstr/splashttpd"
Mar 10 11:46:05 localhost dockerd-current: time="2017-03-10T11:46:05.385662113+08:00" level=info msg="{Action=create, Username=gcshen, LoginUID=1000, PID=2644}"
Mar 10 11:46:07 localhost dockerd-current: time="2017-03-10T11:46:07.079760539+08:00" level=error msg="Error trying v2 registry: Get https://registry-1.docker.io/v2/hrbrmstr/splashttpd/manifests/latest: Get https://auth.docker.io/token?scope=repository%3Ahrbrmstr%2Fsplashttpd%3Apull&service=registry.docker.io: read tcp 192.168.1.101:49134->50.17.62.194:443: read: connection reset by peer"
Mar 10 11:46:07 localhost dockerd-current: time="2017-03-10T11:46:07.079797105+08:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/hrbrmstr/splashttpd/manifests/latest: Get https://auth.docker.io/token?scope=repository%3Ahrbrmstr%2Fsplashttpd%3Apull&service=registry.docker.io: read tcp 192.168.1.101:49134->50.17.62.194:443: read: connection reset by peer"

Ability to access htmlwidgets, local HTMLs

It would be awesome if we can have

render_widget(widget_instance)
render_* to accept file:/// URLs with a path on the host

For all these the biggest inhibitor is that splash is running inside a docker container, so need a way to mount local files.

It'll also be nice if render_* can connect to host's loopback somehow. So we can also use http://localhost:... URLs, where localhost is the host not the docker image.

how to set headers

In render-png(...,header,...) I'd like to be able to set the user-agent in the header as some sites look for that to display a mobile version or not. What's the format to use in the headers parameter?

I tried headers = httr::add_headers(`User-Agent` = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.39 Safari/537.36") and also list(),but no luck. If you could point me in the right direction I'd appreciate it.

Thanks for a great package!

displaying HAR