maaaaz / webscreenshot Goto Github PK

View Code? Open in Web Editor NEW

650.0 25.0 163.0 10.9 MB

A simple script to screenshot a list of websites

License: GNU Lesser General Public License v3.0

JavaScript 19.30% Python 80.70%

webscreenshot's Introduction

webscreenshot

Description

A simple script to screenshot a list of websites, based on the url-to-image PhantomJS script.

Features

Integrating url-to-image 'lazy-rendering' for AJAX resources
Fully functional on Windows and Linux systems
Cookie and custom HTTP header definition support for the PhantomJS renderer
Multiprocessing and killing of unresponding processes after a user-definable timeout
Accepting several formats as input target
Customizing screenshot size (width, height), format and quality
Mapping useful options of PhantomJS such as ignoring ssl error, proxy definition and proxy authentication, HTTP Basic Authentication
Supports multiple renderers:
- PhantomJS, which is legacy and abandoned but the one still producing the best results
- Chromium, Chrome and Edge Chromium, which will replace PhantomJS but currently have some limitations: screenshoting an HTTPS website not having a valid certificate, for instance a self-signed one, will produce an empty screenshot.
  The reason is that the --ignore-certificate-errors option doesn't work and will never work anymore: the solution is to use a proper webdriver, but to date webscreenshot doesn't aim to support this rather complex method requiring some third-party tools.
- Firefox can also be used as a renderer but has some serious limitations (so don't use it for the moment):
  - Impossibility to perform multiple screenshots at the time: no multi-instance of the firefox process
  - No incognito mode, using webscreenshot will pollute your browsing history
Embedding screenshot URL in image (requires ImageMagick)

Usage

Put your targets in a text file and pass it with the -i option, or as a positional argument if you have just a single URL.
Screenshots will be available, by default, in your current ./screenshots/ directory.
Accepted input formats are the following:

http(s)://domain_or_ip:port(/resource)
domain_or_ip:port(/resource)
domain_or_ip(/resource)

Options

webscreenshot.py version 2.94

usage: webscreenshot.py [-h] [-i INPUT_FILE] [-o OUTPUT_DIRECTORY] [-w WORKERS] [-v] [--no-error-file] [-z SINGLE_OUTPUT_FILE] [-p PORT] [-s] [-m]
                        [-r {phantomjs,chrome,chromium,edgechromium,firefox}] [--renderer-binary RENDERER_BINARY] [--no-xserver] [--window-size WINDOW_SIZE]
                        [-f {pdf,png,jpg,jpeg,bmp,ppm}] [-q [0-100]] [--ajax-max-timeouts AJAX_MAX_TIMEOUTS] [--crop CROP] [--custom-js CUSTOM_JS] [-l]
                        [--label-size LABEL_SIZE] [--label-bg-color LABEL_BG_COLOR] [--imagemagick-binary IMAGEMAGICK_BINARY] [-c COOKIE] [-a HEADER]
                        [-u HTTP_USERNAME] [-b HTTP_PASSWORD] [-P PROXY] [-A PROXY_AUTH] [-T PROXY_TYPE] [-t TIMEOUT]
                        [URL]

optional arguments:
  -h, --help            show this help message and exit

Main parameters:
  URL                   Single URL target given as a positional argument
  -i INPUT_FILE, --input-file INPUT_FILE
                        <INPUT_FILE> text file containing the target list. Ex: list.txt
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        <OUTPUT_DIRECTORY> (optional): screenshots output directory (default './screenshots/')
  -w WORKERS, --workers WORKERS
                        <WORKERS> (optional): number of parallel execution workers (default 4)
  -v, --verbosity       <VERBOSITY> (optional): verbosity level, repeat it to increase the level { -v INFO, -vv DEBUG } (default verbosity ERROR)
  --no-error-file       <NO_ERROR_FILE> (optional): do not write a file with the list of URL of failed screenshots (default false)
  -z SINGLE_OUTPUT_FILE, --single-output-file SINGLE_OUTPUT_FILE
                        <SINGLE_OUTPUT_FILE> (optional): name of a file which will be the single output of all inputs. Ex. test.png

Input processing parameters:
  -p PORT, --port PORT  <PORT> (optional): use the specified port for each target in the input list. Ex: -p 80
  -s, --ssl             <SSL> (optional): enforce SSL/TLS for every connection
  -m, --multiprotocol   <MULTIPROTOCOL> (optional): perform screenshots over HTTP and HTTPS for each target

Screenshot renderer parameters:
  -r {phantomjs,chrome,chromium,edgechromium,firefox}, --renderer {phantomjs,chrome,chromium,edgechromium,firefox}
                        <RENDERER> (optional): renderer to use among 'phantomjs' (legacy but best results), 'chrome', 'chromium', 'edgechromium', 'firefox'
                        (version > 57) (default 'phantomjs')
  --renderer-binary RENDERER_BINARY
                        <RENDERER_BINARY> (optional): path to the renderer executable if it cannot be found in $PATH
  --no-xserver          <NO_X_SERVER> (optional): if you are running without an X server, will use xvfb-run to execute the renderer (by default, trying to
                        detect if DISPLAY environment variable exists

Screenshot image parameters:
  --window-size WINDOW_SIZE
                        <WINDOW_SIZE> (optional): width and height of the screen capture (default '1200,800')
  -f {pdf,png,jpg,jpeg,bmp,ppm}, --format {pdf,png,jpg,jpeg,bmp,ppm}
                        <FORMAT> (optional, phantomjs only): specify an output image file format, "pdf", "png", "jpg", "jpeg", "bmp" or "ppm" (default
                        'png')
  -q [0-100], --quality [0-100]
                        <QUALITY> (optional, phantomjs only): specify the output image quality, an integer between 0 and 100 (default 75)
  --ajax-max-timeouts AJAX_MAX_TIMEOUTS
                        <AJAX_MAX_TIMEOUTS> (optional, phantomjs only): per AJAX request, and max URL timeout in milliseconds (default '1400,1800')
  --crop CROP           <CROP> (optional, phantomjs only): rectangle <t,l,w,h> to crop the screen capture to (default to WINDOW_SIZE: '0,0,w,h'), only
                        numbers, w(idth) and h(eight). Ex. "10,20,w,h"
  --custom-js CUSTOM_JS
                        <CUSTOM_JS> (optional, phantomjs only): path of a file containing JavaScript code to be executed before taking the screenshot. Ex:
                        js.txt

Screenshot label parameters:
  -l, --label           <LABEL> (optional): for each screenshot, create another one displaying inside the target URL (requires imagemagick)
  --label-size LABEL_SIZE
                        <LABEL_SIZE> (optional): font size for the label (default 60)
  --label-bg-color LABEL_BG_COLOR
                        <LABEL_BACKGROUND_COLOR> (optional): label imagemagick background color (default NavajoWhite)
  --imagemagick-binary IMAGEMAGICK_BINARY
                        <LABEL_BINARY> (optional): path to the imagemagick binary (magick or convert) if it cannot be found in $PATH

HTTP parameters:
  -c COOKIE, --cookie COOKIE
                        <COOKIE_STRING> (optional): cookie string to add. Ex: -c "JSESSIONID=1234; YOLO=SWAG"
  -a HEADER, --header HEADER
                        <HEADER> (optional): custom or additional header. Repeat this option for every header. Ex: -a "Host: localhost" -a "Foo: bar"
  -u HTTP_USERNAME, --http-username HTTP_USERNAME
                        <HTTP_USERNAME> (optional): specify a username for HTTP Basic Authentication.
  -b HTTP_PASSWORD, --http-password HTTP_PASSWORD
                        <HTTP_PASSWORD> (optional): specify a password for HTTP Basic Authentication.

Connection parameters:
  -P PROXY, --proxy PROXY
                        <PROXY> (optional): specify a proxy. Ex: -P http://proxy.company.com:8080
  -A PROXY_AUTH, --proxy-auth PROXY_AUTH
                        <PROXY_AUTH> (optional): provides authentication information for the proxy. Ex: -A user:password
  -T PROXY_TYPE, --proxy-type PROXY_TYPE
                        <PROXY_TYPE> (optional): specifies the proxy type, "http" (default), "none" (disable completely), or "socks5". Ex: -T socks
  -t TIMEOUT, --timeout TIMEOUT
                        <TIMEOUT> (optional): renderer execution timeout in seconds (default 30 sec)

Examples

list.txt
--------
http://google.fr
https://216.58.213.131
216.58.213.131
https://duckduckgo.com/robots.txt


Default execution with a list
-----------------------------
$ python webscreenshot.py -i list.txt
webscreenshot.py version 2.3

[+] 4 URLs to be screenshot
[+] 4 actual URLs screenshot
[+] 0 error(s)


Default execution with a single URL
-----------------------------------
$ python webscreenshot.py -v google.fr
webscreenshot.py version 2.3

[INFO][General] 'google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[+] 1 URLs to be screenshot
[INFO][http://google.fr:80] Screenshot OK

[+] 1 actual URLs screenshot
[+] 0 error(s)


Increasing verbosity level execution
-----------------------------------
$ python webscreenshot.py -i list.txt -v
webscreenshot.py version 2.3

[INFO][General] 'http://google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[INFO][General] 'https://216.58.213.131' has been formatted as 'https://216.58.213.131:443' with supplied overriding options
[INFO][General] '216.58.213.131' has been formatted as 'http://216.58.213.131:80' with supplied overriding options
[INFO][General] 'https://duckduckgo.com/robots.txt' has been formatted as 'https://duckduckgo.com:443/robots.txt' with supplied overriding options
[+] 4 URLs to be screenshot
[INFO][https://duckduckgo.com:443/robots.txt] Screenshot OK

[INFO][http://216.58.213.131:80] Screenshot OK

[INFO][https://216.58.213.131:443] Screenshot OK

[INFO][http://google.fr:80] Screenshot OK

[+] 4 actual URLs screenshot
[+] 0 error(s)


Results
-------
$ ls -l screenshots/
total 187
-rwxrwxrwx 1 root root 53805 May 19 16:04 http_216.58.213.131_80.png
-rwxrwxrwx 1 root root 53805 May 19 16:05 http_google.fr_80.png
-rwxrwxrwx 1 root root 53805 May 19 16:04 https_216.58.213.131_443.png
-rwxrwxrwx 1 root root 27864 May 19 16:04 https_duckduckgo.com_443_robots.txt.png

Supported options by renderers

Options not listed here below are supported by every current renderer

Option category	Option	PhantomJS renderer	Chromium / Chrome / Edge Chromium renderer	Firefox renderer
Screenshot parameters
	format (`-f`)	Yes	No	No
	quality (`-q`)	Yes	No	No
	ajax and request timeouts (`--ajax-max-timeouts`)	Yes	No	No
	crop (`--crop`)	Yes	No	No
	custom JavaScript (`--custom-js`)	Yes	No	No

HTTP parameters
	cookie (`-c`)	Yes	No	No
	header (`-a`)	Yes	No	No
	http_username (`-u`)	Yes	No	No
	http_password (`-b`)	Yes	No	No

Connection parameters
	proxy (`-P`)	Yes	Yes	No
	proxy_auth (`-A`)	Yes	No	No
	proxy_type (`-T`)	Yes	Yes	No

	Ability to screenshot a HTTPS website with a non-publicly-signed certificate	Yes	No	No

Requirements

A Python interpreter with version 2.7 or 3.X
The webscreenshot python script:
- The easiest way to setup it: pip install webscreenshot and then directly use $ webscreenshot
- Or git clone that repository and pip install -r requirements.txt and then python webscreenshot.py
The PhantomJS tool with at least version 2: follow the installation guide and check the FAQ if necessary
Chrome, Chromium or Firefox > 57 if you want to use one of these renderers
xvfb if you want to run webscreenshot in an headless OS: use the --no-xserver webscreenshot option to ease everything
ImageMagick binary (magick or convert) if you want to embed URL in screenshots with the --label option: follow the installation guide
Check the FAQ before reporting issues

Changelog

version 2.94 - 08/23/2020: Added custom-js and single output file options
version 2.93 - 08/16/2020: Added support of Python 3.8 and Microsoft Edge Chromium ; file output for failed webscreenshots ; filename length limitation for long URL
version 2.92 - 06/21/2020: no_xserver option autodetection
version 2.91 - 05/08/2020: Multiprotocol mode fix
version 2.9 - 01/26/2020: Few fixes
version 2.8 - 01/11/2020: Few fixes, ajax timeouts + crop + label size + label font options added, default values for ajaxTimeout and maxTimeout changed
version 2.7 - 01/04/2020: URL embedding in screenshot option added
version 2.6 - 12/27/2019: Few fixes
version 2.5 - 09/22/2019: Image quality and format options added, PhantomJS useragent updated, modern TLD support
version 2.4 - 05/30/2019: Few fixes for Windows support
version 2.3 - 05/19/2019: Python 3 compatibility, Firefox renderer added, no-xserver option added
version 2.2 - 08/13/2018: Chrome and Chromium renderers support and single URL support
version 2.1 - 01/14/2018: Multiprotocol option addition and PyPI packaging
version 2.0 - 03/08/2017: Adding proxy-type option
version 1.9 - 01/10/2017: Using ALL SSL/TLS ciphers
version 1.8 - 07/05/2015: Option groups definition
version 1.7 - 06/28/2015: HTTP basic authentication support + loglevel option changed to verbosity
version 1.6 - 04/23/2015: Transparent background fix
version 1.5 - 01/11/2015: Cookie and custom HTTP header support
version 1.4 - 10/12/2014: url-to-image PhantomJS script integration + few bugs corrected
version 1.3 - 08/05/2014: Windows support + few bugs corrected
version 1.2 - 04/27/2014: Few bugs corrected
version 1.1 - 04/21/2014: Changed the script to use PhantomJS instead of the buggy wkhtml binary
version 1.0 - 01/12/2014: Initial commit

Copyright and license

webscreenshot is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

webscreenshot is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with webscreenshot. If not, see http://www.gnu.org/licenses/.

Contact

Thomas Debize < tdebize at mail d0t com >

webscreenshot's People

Contributors

Stargazers

Watchers

Forkers

carloslannister cpl0 lucabongiorni edwindvinas bitsledge cm-studio mmendozadelsolar percevalsa w3bt00lz attila-koteles mig1098 samyoyo peterg75 ellipsys unixfreaxjp olivierh59500 adisuissa legend23 sugardayfox akatashkov arryboom ashr shamrocksu88 davidoesch shellsec samhaxr firebitsbr nogit foxweek mannyramos smiegles fbeneventi baddot zerocry dcyang fortunec00kie ro9ueadmin navytitanium fiuderazes mmg1 andr6 ismailbozkurt mitchellkrogza manuelbua elaa0505 la3b0z iamlucamilan wottan32 magnologan rohitcoder elamaran619 kjigs pager5cx415cx415cx69 thunfischbrot lvir0 xmrseo affilares jesselau76 48h153k jamesbercegay lister777 zehua99 diogper hacky1997 precious udohsolomon jiferent maxwelllwang ksanchezcld mrengler websecresearch an4kein dangkhai0x21 swipswaps simasta n4rr34n6 hyuckang rgnevashev shahid1996 milo2012 5l1v3r1 youdinforsec thelostworldfree j4ckzh0u galloclaudio cyberican jiro38 zonicdoe n0b1t4 arukaminado pancudaniel7 kong-f actorexpose przor3n wooyunweige alokbatham laowang1026 gracecarrillo prinsharma1999 mrawb

webscreenshot's Issues

[ERROR][https://staging.site.com:/] Shell command PID 6787 returned an abnormal error code: '1'

[ERROR][https://staging.site.com:/] Shell command PID 6787 returned an abnormal error code: '1'
ScreenShot Somehow failed

Getting this error whenever i run the script can't seems to find a solution ...

preexec_fn is not supported on Windows platforms

When running on Windows 10 and Python 3.8.0 I received the below error:

python webscreenshot.py google.com
webscreenshot.py version 2.8

[+] 1 URLs to be screenshot
[ERROR][General] Unknown error: preexec_fn is not supported on Windows platforms, exiting
[+] 0 actual URLs screenshot
[+] 1 error(s)
    http://google.com:80

As a quick fix removing "preexec_fn=group_subprocesses" from the call to Popen (shown below) seemed to resolve this. You may want to consider removing this or looking at alternatives.

p = subprocess.Popen(shlex.split(command, posix=not(is_windows())), shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, preexec_fn=group_subprocesses)

All ports screenshots

Is it possible to screenshot an URL is its ALL OPEN PORTS? If so, what's the command? If not...is it possible to implement it?

thx

[ERROR][https://www.gpw.pl:443/session-details] Shell command PID 5346 returned an abnormal error code: '1'

[ERROR][https://www.gpw.pl:443/session-details] Shell command PID 5346 returned an abnormal error code: '1'
[ERROR][https://www.gpw.pl:443/session-details] Screenshot somehow failed
This seems to be working for few URL's but for most of the others it does not seems to be working.

Checked that Phantomjs is latest
[root@XXX ~]# phantomjs --version 2.1.1

[root@XXX ~]# phantomjs
phantomjs> phantom.version
{
"major": 2,
"minor": 1,
"patch": 1
}

Getting this error whenever i run the script can't seems to find a solution ...

Memory Exhaustion possibly due to unfinished PhantomJS Processes

While running webscreenshot on a Digital Ocean Ubuntu Box with 2 GB of RAM on a list of approximately 7500 valid HTTP servers, the tool starts working correctly with an overall 250-300 MB of memory compsumtion on the whole OS.

As the tool continues working, that memory compsumption increases over time until there isn't any more available to use and the tool starts sending errors. I took a screenshot of htop, which I'm attaching, during execution while everything still works okay and I notice docens of phantomjs processes, although I set the tool to run just 1 worker. If I kill all these processes everything goes back to normal.

The command used was the following:

webscreenshot --no-xserver -r phantomjs -w 1 --window-size 640,360 -o output_dir -i servers_file

That was ran inside a screen, although I don't think that would make a difference.

The could be an issue in the process of finishing phantomjs processes. It would be great if this could be reviewed.

Thanks in advance,

Htop screenshot:

Try to make it Python 2 and 3 compatible

Enhancement: Functionality to get a result of error url list

It would be good to have a feature to get those failed screenshot URL list to run over those manually or by another tool.

Windows support

As of today, the script only works on Linux.

OUTPUT file name

how to chaning to output file name

custom filename with timestamp suffix

Hi, this is a great tool.
I'd really like to be able to change the output filename for each png saved. Ideally, there would be a command line parameter that would simply append a datestamp as a string to the existing filename, which is the url. Is this something that is possible through the command line, or would I have to fork the repo and adjust this line to add an option to append it?
You mention there are some length constraints even below 255 for certain renderers. Are those constraints documented somewhere? What ballpark should be avoided?

Screenshot went some wrong

Hi,
I like your tool.
But I have an error on some pages, like this:
[ERROR][https://......] Screenshot somehow failed

I think it is a problem because of an embedded youtube video

why the screenshot folder is empty

I have installed this tool successfully. I tried to python webscreenshot.py -i test.txt , but the screenshot directory is empty. why?

Best quality possible?

First I have to say that it's a great tool. I am so happy that I found it!

Now I need it for a screenshot where I use the cv2 library on it. And the quality leads to about 10% errors.
Here you can find a part of it.

My settings are

quality=100
window_size=2400,1000
format=jpg
...

Of course, I tried around with the numbers. But with highest quality the window size does not change a lot.

Any ideas or hints?
Thanks!

Obtain output file name

Hello,

How can I obtain the output file name after a successful screenshot while calling from inside of another Python script?

Replace phantomjs by headless Google Chrome

Hi !

phamtomjs' only maintainer has given up on maintaining his software, due to the release of Google Chrome headless browser in version 59

Have you considered to migrate webscreenshot's backend accordingly ?

Thanks in advance !

😛

Pass CSS or JS to page to disable notifications

Would be nice to be able to pass CSS or JS to the webpage to hide cookie notifications.

This platform lacks of sem_open

Using Termux on Android.

Solution:

try:
    from multiprocessing import Process, Queue
except ImportError:
    from threading import Thread as Process
    from queue import Queue

asciinema/asciinema#271 (comment)

Shell command PID xxxx returned abnormal error code '-6'.

Hello, having some issues getting this to run. Just for a test here is what is happening:

list.txt
google.com
twitter.com

command being issued:
python webscreenshot.py -i testlist.txt -vv

Output:

python webscreenshot.py -i testlist.txt -vv

webscreenshot.py version 1.8

[DEBUG][General] Options: {'log_level': 'DEBUG', 'http_username': None, 'input_file': 'testlist.txt', 'workers': 2, 'output_directory': None, 'header': None, 'verbosit
y': 2, 'cookie': None, 'proxy': None, 'timeout': 30, 'proxy_auth': None, 'http_password': None, 'ssl': False, 'port': None}

[INFO][General] 'google.com' has been formatted as 'http://google.com:80' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://google.com:80] Shell command to be executed
'phantomjs --ignore-ssl-errors true --ssl-protocol any "/home/osintscreenshot/webscreenshot/original/webscreenshot.js" url_capture="http://google.com:80" output_file="
/home/osintscreenshot/webscreenshot/original/screenshots/http_google.com_80.png"'

[ERROR][http://google.com:80] Shell command PID 29124 returned an abnormal error code: '-6'
[ERROR][http://google.com:80] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
http://google.com:80

Any ideas?

Wait for sometime ( webpage loading fully）before taking the screenshot

Hi, I am not very familiar with phantomjs and chrome's api.
So how should I change the source code to take the screenshot after the webpage is fully-loaded ?

cannot ignore certificate error

Cannot find any way to ignore SSL certificate error by the application.

HTTP Password with space in it doesn't work

Need to write password in quotes here?

cmd_parameters.append('http_password=%s' % options.http_password) if options.http_password != None else None

renderer binary could not have been found in your current PATH environment variable.

when try to run webscreenshot after installing requirements,
it give me this error:
webscreenshot.py version 2.4

[+] 4 URLs to be screenshot
[ERROR][https://173.194.67.113:443] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][http://173.194.67.113:80] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][https://duckduckgo.com:443/robots.txt] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][http://google.fr:80] renderer binary could not have been found in your current PATH environment variable, exiting
[+] 4 actual URLs screenshot
[+] 0 error(s)

Unable to get complete url screenshot

3D maps from maps.google.com do not work

python webscreenshot.py -r chromium 'https://www.google.com.ph/maps/@55.6652884,12.4946009,89a,35y,254.22h,47.58t/data=!3m1!1e3?hl=en' gives the loading image and not the full 3D map as you get in a browser.

AttributeError: 'Namespace' object has no attribute 'format'

This is the error code (line 361, in the craft_cmd function):

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/XUSERX/.local/lib/python3.6/site-packages/webscreenshot/webscreenshot.py", line 361, in craft_cmd
    output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'

Would be great if I could get it run!
Thanks for help.

Request of feature: URL with image

It would be great if you can add feature to show URL with the screenshot

Redirect Support / Nodejs Webpages / Additional Logging?

Does webscreenshot support redirects when taking screenshots?

For example, this URL generates a blank screenshot with chrome as the web driver and webscreenshot version 2.4:

https://mohaaaa.co.uk

The below URL is for a node.js powered site:

https://videos.dinofly.com

When it is screenshot, it doesn't appear to be loading all of the JavaScript / finishing the rendering as it would appear to an end user? Any idea on this one?

Also, rather than outputting messages such as Screenshot somehow failed, is there a way to find out why it somehow failed? I think it would be nice to have more info if possible.

Using tool with external python script on a Jupyter Notebook

Good day, I have installed webscreenshot.
I tried running it in python, but i came into this error

[+] 2 URLs to be screenshot

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\z0014071\webscreenshot.py", line 361, in craft_cmd
    output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'
"""

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-74b6b93cdc5a> in <module>()
      9 
     10 # actually launching the function
---> 11 take_screenshot(url_list, options)

~\webscreenshot.py in take_screenshot(url_list, options)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

~\webscreenshot.py in <listcomp>(.0)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in next(self, timeout)
    746         if success:
    747             return value
--> 748         raise value
    749 
    750     __next__ = next                    # XXX

AttributeError: 'Namespace' object has no attribute 'format'

here is my Code

import argparse
from webscreenshot import *

# url list to screenshot
url_list = ['http://google.de', 'http://google.com']

# defining options manually
options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

# actually launching the function
take_screenshot(url_list, options)

taken from
#19 (comment)

How to use webscreenshot from inside a python script?

The documentation states:

pip install webscreenshot and then directly use webscreenshot

How does one directly use webscreenshot?

My python script contains:

import webscreenshot

Now, how do I call webscreenshot directly from the script? The documentation doesn't provide any examples. It does for calling the script from the commandline and passing arguments, but I want to call it directly from inside my python script.

webscreenshot.take_screenshot(list_of_urls) doesn't seem to work.

Screenshot not being saved

Dear Maaaaz

First of all thank you so much for such a great tool (eventhough I am not able to use it correctly). I am running latest raspian os (32bit) on my Raspberry Pi 4 and everything with installation worked out pretty darn well. However, if I run the script as intended, it does not save the screenshot nor does it return any errors.

All the folders in which I am working have permission 777, including the screenshots folder which was sucessfully created by your program. I have tried to force ssl and tried different websites but none of them works. I used to use the renderer binary from chromium but as I saw no results switched to PhantomJS, but still nothing.

Here is the debug info (though I don't know if this is worth anything for you):

root@**hostname**:/somepath/# python webscreenshot -vv --renderer-binary /opt/phantomjs-2.1.1-linux-i686/bin/phantomjs 192.168.0.220
webscreenshot.py version 2.91

[DEBUG][General] Options: Namespace(URL='192.168.0.220', ajax_max_timeouts='1400,1800', cookie=None, crop=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file=None, label=False, label_bg_color='NavajoWhite', label_size=60, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/somepath/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary='/opt/phantomjs-2.1.1-linux-i686/bin/phantomjs', ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

[INFO][General] '192.168.0.220' has been formatted as 'http://192.168.0.220:80' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://192.168.0.220:80] Shell command to be executed
'/opt/phantomjs-2.1.1-linux-i686/bin/phantomjs --ignore-ssl-errors=true --ssl-protocol=any --ssl-ciphers=ALL "/usr/local/lib/python3.7/dist-packages/webscreenshot/webscreenshot.js" url_capture=http://192.168.0.220:80 output_file="/somepath/screenshots/http_192.168.0.220_80.png" width=1200 height=800 format=png quality=75 ajaxtimeout=1400 maxtimeout=1800'

[+] 1 actual URLs screenshot
[+] 0 error(s)

As you can see I am running latest version of everything so and system is up to date :) Any ideas?

Best regards
Marco Leder

Transparent background

When dealing with websites with transparent background, it's hard to take a look at the captures.

Could you save the captures as jpg files or force a blank background ?

Thx.

English issue

According to Merriam-Webster dictionnary (http://www.merriam-webster.com/dictionary/shoot), the past participle of "shoot" is "shot". Therefore :
"[+] 15 URLs to be screenshotted" => " [...] to be screenshot"

Screenshot somehow failed on some websites / Shell command abnormal code

Hey!
I am trying to use your program to download ~2k screenshots, running it for ~150 links at a time.
For some websites it just doesn't work, saying both "Screenshot somehow failed" and "Shell command PID ---- returned abnormal error code '1' ". I ran it with xvfb, without xvfb, with -vv, without -vv.
The websites that don't work aren't consistent; if I run it now, I might get 5/31 (list with links that fail), if I run it again, I might get 4 or 6. Also, running it with -m only helps with some of the websites, still remaining a handful of them that don't work on either :443 and :80.
I have the latest PhantomJS version, I have installed webscreenshot from pip and I also cloned the repo, ran it both ways a number of times.
The network I'm connected to does not have any restrictions.
The webiste list is composed of the few thousand beggining link from Alexa top 1mil.

Domain does not gets formatted automatically due to which command fails

I tried to execute it from within a python code but it kept failing. I tried took the DEBUG output and matched it wih when the command is run directly from shell.
I tcame out to be that when I provide a list of domains from within another python code then the domains aren't being converted from google.com to http://google.com:80 due to which the command fails.

Screenshot when ran inside python

Screenshot when ran directly

Some lazy-rendering question.

When I use the webscreenshot to screenshot http://mp.weixin.qq.com/s/AZa0at-yu0oVyzCGtsrPIQ.
I found some of the image become grey with progress.
Is there any solution?

error renderer binary could not have been found in your current PATH environment variable, exiting

I am having some problems when using the tool

error:
[ERROR][http://website:port] renderer binary could not have been found in your current PATH environment variable, exiting

Issue with PATH env

Got some issue with the PATH env.
I'm trying to make it work under Win7.

set PHANTOMJS_BIN="C:\Program Files\phantomjs"

[WinError 2] The system cannot find the file specified
[ERROR][http://abc.xyz:80] renderer binary could not have been found in your current PATH environment variable, exiting

Tried with explicitly adding -r phantomjs, or chrome, chromium, none of them work.

Encoding issues

Apparently the script doesn't work unless the input file is encoded in "UTF-8 without BOM".

Window Size Configuration (either default or not) Doesn't Seem to Work

Hello maaaaz, and thanks again for this great tool and the support. Today I am reporting an issue that seems pretty obvious to me, so I don't know if it happens under certain conditions in my environments, although I tested it both on Linux and Mac. Let me know if I can provide more data than the following:

Issue: Output screenshots have the content of the whole page, resulting in really long images, even though the default length should be 800 px and also when using the --window-size argument.
Tested Version: webscreenshot 2.8 (but it happened with older versions too)
Tested OS: Linux and Mac
Reproduction steps on Linux: Run the following command and check the output image has more than 360 pixels on length.

python3 webscreenshot.py --no-xserver -r phantomjs -w 1 --window-size 640,360 -o ~ https://www.apple.com/mx/mac/

`**Reproduction steps on Mac ** Run the following command and check the output image has more than 360 pixels on length.

python3 webscreenshot.py --no-xserver -r phantomjs -w 1 --window-size 640,360 -o ~ https://www.apple.com/mx/mac/

Output Image:

error code: 1

I have done the solution given by @putsi but it didn't worked. It is still giving me the same error
code : xvfb-run python webscreenshot.py -i file.txt -o ~/dir/dir/result.txt -w 20 -a "X-FORWARDED-FOR: 127.0.0.1"

so what should i do now?

Python 2.7/3.7

Is this program for python 3.7? When I try to run the program it says "Missing parentheses in call to 'print'. Did you mean print("[+] %s URLs to be screenshot" % screenshot_number)?". So, I then assumed it was Python 2.7. Would you like me to add the parenthesis? or maybe you could add that its only for python 2.7

Ajax actions before taking the snapshot

Would it be possible to perform an action before taking the snapshots?
For e.g. clicking a button inside the webpage to show some hidden information

" def lookup(driver):
driver.get("url")
try:
#box = driver.wait.until(EC.presence_of_element_located((By.NAME, "q")))

    #button = driver.find_element_by_class_name('oc-home-search__button')
    button = driver.find_element_by_id('showPhone')
    #box.send_keys(query)
    button.click()

    image = driver.find_element_by_class_name('telnumber')
    print(image.get_attribute('element'))
    abc = driver.FindElement(By.XPath("//div[contains(@style, 'background-image'))
    #print(image.get_attribute("url"))

except TimeoutException:
    print("Box or Button not found!")

Crop/resize image?

The screenshots are always the full height of the webpage.

/usr/bin/env python3 webscreenshot.py --no-xserver -f jpg -q 80 --window-size 1600,768 blogger.com
I was wondering if it is possible to define the viewport?

Shell command PID XXXXX returned an abnormal error code: '-11'

Hi,
I'm getting this error on this specific url It does create the screenshot ok but the error appears.
Screenshots on other links in the same domain seem to work ok.

urls.txt:
http://www.milenio.com/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza

Command used:
webscreenshot -vv -i urls.txt -o news_extractor/static/screenshots --ajax-max-timeouts 2500,2500

Output:

webscreenshot.py version 2.94

[DEBUG][General] Options: Namespace(URL=None, ajax_max_timeouts='2500,2500', cookie=None, crop=None, custom_js=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file='urls.txt', label=False, label_bg_color='NavajoWhite', label_size=60, log_level='DEBUG', multiprotocol=False, no_error_file=False, no_xserver=False, output_directory='/Users/aldricmp1/PycharmProjects/newsextractorapp/news_extractor/static/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary=None, single_output_file=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

[INFO][General] 'http://www.milenio.com/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza' has been formatted as 'http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Shell command to be executed
'phantomjs --ignore-ssl-errors=true --ssl-protocol=any --ssl-ciphers=ALL "/Users/aldricmp1/opt/anaconda3/envs/newsextractorapp/lib/python3.6/site-packages/webscreenshot/webscreenshot.js" url_capture=http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza output_file="/Users/aldricmp1/PycharmProjects/newsextractorapp/news_extractor/static/screenshots/http_www.milenio.com_80_politica_lopez-gatell-habra-hospitales-llenos-por-epoca-influenza.png" width=1200 height=800 format=png quality=75 ajaxtimeout=2500 maxtimeout=2500'

[ERROR][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Shell command PID 91585 returned an abnormal error code: '-11'
[ERROR][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza

Shell command PID xxxx returned an abnormal error code: '4294967295'

I got a Error message.

How solve?

My environment:

Windows10 64bit
python 3.8
phantomjs 2.5.0-development
webscreenshot 2.9

C:\Program Files\Python38\Lib\site-packages\webscreenshot> python .\webscreenshot.py http://www.google.co.kr -r phantomjs --renderer-binary "C:\Program Files\phantomjs-2.5.0-beta-windows\bin\phantomjs.exe"

webscreenshot.py version 2.9

[+] 1 URLs to be screenshot
[ERROR][http://www.google.co.kr:80] Shell command PID 5012 returned an abnormal error code: '4294967295'
[ERROR][http://www.google.co.kr:80] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
    http://www.google.co.kr:80

Issue in URL Pre-Formatter prefixes https protocol when port is 443 although http is provided

When webscreenshot is supplied a list of URLs and one of them is http://domain.com:443, notice the http protocol instead of https, the formatter changes the http protocol to https.

Example (the real domain was changed to domain.com):

webscreenshot -vv http://domain.com:443
[INFO][General] 'http://domain.com:443' has been formatted as 'https://domain.com:443' with supplied overriding options

I understand that 443 is the port typically used for https, but it is also possible to run http, or even any other protocol on 443, and I've seen cases doing recon of this happening, http (not https) on 443.

When someone supplies a list already providing the protocol and the port, I believe it doesn't make sense to run the pre formatter.

workaround for 403 or Forbidden

https://stackoverflow.com/questions/59954122/phantomjs-page-render-blank-white-background-image-on-403

page.render doesn't seem to work if page returns 403 status or Forbidden.
Do you have any workaround in mind to get the actual screenshot of page instead of transparent or plain background image.

Webscreenshot.py making a weird screenshot (used within different python script)

Hi currently im using the take_screenshot function from another python script. Using python 3.6 on Ubuntu

options = argparse.Namespace(URL=None, ajax_max_timeouts='1400,1800', cookie=None, crop=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file=None, label=False, log_level='DEBUG', multiprotocol=False, no_xserver=True, output_directory=path, port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4) -> take_screenshot(url_container, options)

the url container just contains a bunch of urls.

The output which gets generated on a specific page which I want to monitor looks like this

but it should look more like this

why does this happen?

No option to wait until page is loaded

I am trying to screenshot a webpage that takes a few moments to fully load, and the script is simply running and screenshotting before the page can load. Is there an option to set a sleep period? Or could one be added?

Cookie parameter ignored when using chrome as engine

Using this command does not work, as the option -c seems to be ignored:

webscreenshot -v -r chromium --no-xserver -c COOKIE "JSESSIONID=1234; YOLO=SWAG" http://grim.at/tmp/cookie.php

Looking in the source code it seems as if the option is only present using phantomjs - can you please fix this?

Thanks in advance!
Wolfgang