Giter Site home page Giter Site logo

webscreenshot's Introduction

webscreenshot

Description

A simple script to screenshot a list of websites, based on the url-to-image PhantomJS script.

Features

  • Integrating url-to-image 'lazy-rendering' for AJAX resources
  • Fully functional on Windows and Linux systems
  • Cookie and custom HTTP header definition support for the PhantomJS renderer
  • Multiprocessing and killing of unresponding processes after a user-definable timeout
  • Accepting several formats as input target
  • Customizing screenshot size (width, height), format and quality
  • Mapping useful options of PhantomJS such as ignoring ssl error, proxy definition and proxy authentication, HTTP Basic Authentication
  • Supports multiple renderers:
    • PhantomJS, which is legacy and abandoned but the one still producing the best results
    • Chromium, Chrome and Edge Chromium, which will replace PhantomJS but currently have some limitations: screenshoting an HTTPS website not having a valid certificate, for instance a self-signed one, will produce an empty screenshot.
      The reason is that the --ignore-certificate-errors option doesn't work and will never work anymore: the solution is to use a proper webdriver, but to date webscreenshot doesn't aim to support this rather complex method requiring some third-party tools.
    • Firefox can also be used as a renderer but has some serious limitations (so don't use it for the moment):
      • Impossibility to perform multiple screenshots at the time: no multi-instance of the firefox process
      • No incognito mode, using webscreenshot will pollute your browsing history
  • Embedding screenshot URL in image (requires ImageMagick)

Usage

Put your targets in a text file and pass it with the -i option, or as a positional argument if you have just a single URL.
Screenshots will be available, by default, in your current ./screenshots/ directory.
Accepted input formats are the following:

http(s)://domain_or_ip:port(/resource)
domain_or_ip:port(/resource)
domain_or_ip(/resource)

Options

webscreenshot.py version 2.94

usage: webscreenshot.py [-h] [-i INPUT_FILE] [-o OUTPUT_DIRECTORY] [-w WORKERS] [-v] [--no-error-file] [-z SINGLE_OUTPUT_FILE] [-p PORT] [-s] [-m]
                        [-r {phantomjs,chrome,chromium,edgechromium,firefox}] [--renderer-binary RENDERER_BINARY] [--no-xserver] [--window-size WINDOW_SIZE]
                        [-f {pdf,png,jpg,jpeg,bmp,ppm}] [-q [0-100]] [--ajax-max-timeouts AJAX_MAX_TIMEOUTS] [--crop CROP] [--custom-js CUSTOM_JS] [-l]
                        [--label-size LABEL_SIZE] [--label-bg-color LABEL_BG_COLOR] [--imagemagick-binary IMAGEMAGICK_BINARY] [-c COOKIE] [-a HEADER]
                        [-u HTTP_USERNAME] [-b HTTP_PASSWORD] [-P PROXY] [-A PROXY_AUTH] [-T PROXY_TYPE] [-t TIMEOUT]
                        [URL]

optional arguments:
  -h, --help            show this help message and exit

Main parameters:
  URL                   Single URL target given as a positional argument
  -i INPUT_FILE, --input-file INPUT_FILE
                        <INPUT_FILE> text file containing the target list. Ex: list.txt
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        <OUTPUT_DIRECTORY> (optional): screenshots output directory (default './screenshots/')
  -w WORKERS, --workers WORKERS
                        <WORKERS> (optional): number of parallel execution workers (default 4)
  -v, --verbosity       <VERBOSITY> (optional): verbosity level, repeat it to increase the level { -v INFO, -vv DEBUG } (default verbosity ERROR)
  --no-error-file       <NO_ERROR_FILE> (optional): do not write a file with the list of URL of failed screenshots (default false)
  -z SINGLE_OUTPUT_FILE, --single-output-file SINGLE_OUTPUT_FILE
                        <SINGLE_OUTPUT_FILE> (optional): name of a file which will be the single output of all inputs. Ex. test.png

Input processing parameters:
  -p PORT, --port PORT  <PORT> (optional): use the specified port for each target in the input list. Ex: -p 80
  -s, --ssl             <SSL> (optional): enforce SSL/TLS for every connection
  -m, --multiprotocol   <MULTIPROTOCOL> (optional): perform screenshots over HTTP and HTTPS for each target

Screenshot renderer parameters:
  -r {phantomjs,chrome,chromium,edgechromium,firefox}, --renderer {phantomjs,chrome,chromium,edgechromium,firefox}
                        <RENDERER> (optional): renderer to use among 'phantomjs' (legacy but best results), 'chrome', 'chromium', 'edgechromium', 'firefox'
                        (version > 57) (default 'phantomjs')
  --renderer-binary RENDERER_BINARY
                        <RENDERER_BINARY> (optional): path to the renderer executable if it cannot be found in $PATH
  --no-xserver          <NO_X_SERVER> (optional): if you are running without an X server, will use xvfb-run to execute the renderer (by default, trying to
                        detect if DISPLAY environment variable exists

Screenshot image parameters:
  --window-size WINDOW_SIZE
                        <WINDOW_SIZE> (optional): width and height of the screen capture (default '1200,800')
  -f {pdf,png,jpg,jpeg,bmp,ppm}, --format {pdf,png,jpg,jpeg,bmp,ppm}
                        <FORMAT> (optional, phantomjs only): specify an output image file format, "pdf", "png", "jpg", "jpeg", "bmp" or "ppm" (default
                        'png')
  -q [0-100], --quality [0-100]
                        <QUALITY> (optional, phantomjs only): specify the output image quality, an integer between 0 and 100 (default 75)
  --ajax-max-timeouts AJAX_MAX_TIMEOUTS
                        <AJAX_MAX_TIMEOUTS> (optional, phantomjs only): per AJAX request, and max URL timeout in milliseconds (default '1400,1800')
  --crop CROP           <CROP> (optional, phantomjs only): rectangle <t,l,w,h> to crop the screen capture to (default to WINDOW_SIZE: '0,0,w,h'), only
                        numbers, w(idth) and h(eight). Ex. "10,20,w,h"
  --custom-js CUSTOM_JS
                        <CUSTOM_JS> (optional, phantomjs only): path of a file containing JavaScript code to be executed before taking the screenshot. Ex:
                        js.txt

Screenshot label parameters:
  -l, --label           <LABEL> (optional): for each screenshot, create another one displaying inside the target URL (requires imagemagick)
  --label-size LABEL_SIZE
                        <LABEL_SIZE> (optional): font size for the label (default 60)
  --label-bg-color LABEL_BG_COLOR
                        <LABEL_BACKGROUND_COLOR> (optional): label imagemagick background color (default NavajoWhite)
  --imagemagick-binary IMAGEMAGICK_BINARY
                        <LABEL_BINARY> (optional): path to the imagemagick binary (magick or convert) if it cannot be found in $PATH

HTTP parameters:
  -c COOKIE, --cookie COOKIE
                        <COOKIE_STRING> (optional): cookie string to add. Ex: -c "JSESSIONID=1234; YOLO=SWAG"
  -a HEADER, --header HEADER
                        <HEADER> (optional): custom or additional header. Repeat this option for every header. Ex: -a "Host: localhost" -a "Foo: bar"
  -u HTTP_USERNAME, --http-username HTTP_USERNAME
                        <HTTP_USERNAME> (optional): specify a username for HTTP Basic Authentication.
  -b HTTP_PASSWORD, --http-password HTTP_PASSWORD
                        <HTTP_PASSWORD> (optional): specify a password for HTTP Basic Authentication.

Connection parameters:
  -P PROXY, --proxy PROXY
                        <PROXY> (optional): specify a proxy. Ex: -P http://proxy.company.com:8080
  -A PROXY_AUTH, --proxy-auth PROXY_AUTH
                        <PROXY_AUTH> (optional): provides authentication information for the proxy. Ex: -A user:password
  -T PROXY_TYPE, --proxy-type PROXY_TYPE
                        <PROXY_TYPE> (optional): specifies the proxy type, "http" (default), "none" (disable completely), or "socks5". Ex: -T socks
  -t TIMEOUT, --timeout TIMEOUT
                        <TIMEOUT> (optional): renderer execution timeout in seconds (default 30 sec)

Examples

list.txt
--------
http://google.fr
https://216.58.213.131
216.58.213.131
https://duckduckgo.com/robots.txt


Default execution with a list
-----------------------------
$ python webscreenshot.py -i list.txt
webscreenshot.py version 2.3

[+] 4 URLs to be screenshot
[+] 4 actual URLs screenshot
[+] 0 error(s)


Default execution with a single URL
-----------------------------------
$ python webscreenshot.py -v google.fr
webscreenshot.py version 2.3

[INFO][General] 'google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[+] 1 URLs to be screenshot
[INFO][http://google.fr:80] Screenshot OK

[+] 1 actual URLs screenshot
[+] 0 error(s)


Increasing verbosity level execution
-----------------------------------
$ python webscreenshot.py -i list.txt -v
webscreenshot.py version 2.3

[INFO][General] 'http://google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[INFO][General] 'https://216.58.213.131' has been formatted as 'https://216.58.213.131:443' with supplied overriding options
[INFO][General] '216.58.213.131' has been formatted as 'http://216.58.213.131:80' with supplied overriding options
[INFO][General] 'https://duckduckgo.com/robots.txt' has been formatted as 'https://duckduckgo.com:443/robots.txt' with supplied overriding options
[+] 4 URLs to be screenshot
[INFO][https://duckduckgo.com:443/robots.txt] Screenshot OK

[INFO][http://216.58.213.131:80] Screenshot OK

[INFO][https://216.58.213.131:443] Screenshot OK

[INFO][http://google.fr:80] Screenshot OK

[+] 4 actual URLs screenshot
[+] 0 error(s)


Results
-------
$ ls -l screenshots/
total 187
-rwxrwxrwx 1 root root 53805 May 19 16:04 http_216.58.213.131_80.png
-rwxrwxrwx 1 root root 53805 May 19 16:05 http_google.fr_80.png
-rwxrwxrwx 1 root root 53805 May 19 16:04 https_216.58.213.131_443.png
-rwxrwxrwx 1 root root 27864 May 19 16:04 https_duckduckgo.com_443_robots.txt.png

Supported options by renderers

Options not listed here below are supported by every current renderer

Option category Option PhantomJS renderer Chromium / Chrome / Edge Chromium renderer Firefox renderer
Screenshot parameters
format (-f) Yes No No
quality (-q) Yes No No
ajax and request timeouts (--ajax-max-timeouts) Yes No No
crop (--crop) Yes No No
custom JavaScript (--custom-js) Yes No No
HTTP parameters
cookie (-c) Yes No No
header (-a) Yes No No
http_username (-u) Yes No No
http_password (-b) Yes No No
Connection parameters
proxy (-P) Yes Yes No
proxy_auth (-A) Yes No No
proxy_type (-T) Yes Yes No
Ability to screenshot a HTTPS website with a non-publicly-signed certificate Yes No No

Requirements

  • A Python interpreter with version 2.7 or 3.X
  • The webscreenshot python script:
    • The easiest way to setup it: pip install webscreenshot and then directly use $ webscreenshot
    • Or git clone that repository and pip install -r requirements.txt and then python webscreenshot.py
  • The PhantomJS tool with at least version 2: follow the installation guide and check the FAQ if necessary
  • Chrome, Chromium or Firefox > 57 if you want to use one of these renderers
  • xvfb if you want to run webscreenshot in an headless OS: use the --no-xserver webscreenshot option to ease everything
  • ImageMagick binary (magick or convert) if you want to embed URL in screenshots with the --label option: follow the installation guide
  • Check the FAQ before reporting issues

Changelog

  • version 2.94 - 08/23/2020: Added custom-js and single output file options
  • version 2.93 - 08/16/2020: Added support of Python 3.8 and Microsoft Edge Chromium ; file output for failed webscreenshots ; filename length limitation for long URL
  • version 2.92 - 06/21/2020: no_xserver option autodetection
  • version 2.91 - 05/08/2020: Multiprotocol mode fix
  • version 2.9 - 01/26/2020: Few fixes
  • version 2.8 - 01/11/2020: Few fixes, ajax timeouts + crop + label size + label font options added, default values for ajaxTimeout and maxTimeout changed
  • version 2.7 - 01/04/2020: URL embedding in screenshot option added
  • version 2.6 - 12/27/2019: Few fixes
  • version 2.5 - 09/22/2019: Image quality and format options added, PhantomJS useragent updated, modern TLD support
  • version 2.4 - 05/30/2019: Few fixes for Windows support
  • version 2.3 - 05/19/2019: Python 3 compatibility, Firefox renderer added, no-xserver option added
  • version 2.2 - 08/13/2018: Chrome and Chromium renderers support and single URL support
  • version 2.1 - 01/14/2018: Multiprotocol option addition and PyPI packaging
  • version 2.0 - 03/08/2017: Adding proxy-type option
  • version 1.9 - 01/10/2017: Using ALL SSL/TLS ciphers
  • version 1.8 - 07/05/2015: Option groups definition
  • version 1.7 - 06/28/2015: HTTP basic authentication support + loglevel option changed to verbosity
  • version 1.6 - 04/23/2015: Transparent background fix
  • version 1.5 - 01/11/2015: Cookie and custom HTTP header support
  • version 1.4 - 10/12/2014: url-to-image PhantomJS script integration + few bugs corrected
  • version 1.3 - 08/05/2014: Windows support + few bugs corrected
  • version 1.2 - 04/27/2014: Few bugs corrected
  • version 1.1 - 04/21/2014: Changed the script to use PhantomJS instead of the buggy wkhtml binary
  • version 1.0 - 01/12/2014: Initial commit

Copyright and license

webscreenshot is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

webscreenshot is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with webscreenshot. If not, see http://www.gnu.org/licenses/.

Contact

  • Thomas Debize < tdebize at mail d0t com >

webscreenshot's People

Contributors

maaaaz avatar percevalsa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webscreenshot's Issues

preexec_fn is not supported on Windows platforms

When running on Windows 10 and Python 3.8.0 I received the below error:

python webscreenshot.py google.com
webscreenshot.py version 2.8

[+] 1 URLs to be screenshot
[ERROR][General] Unknown error: preexec_fn is not supported on Windows platforms, exiting
[+] 0 actual URLs screenshot
[+] 1 error(s)
    http://google.com:80

As a quick fix removing "preexec_fn=group_subprocesses" from the call to Popen (shown below) seemed to resolve this. You may want to consider removing this or looking at alternatives.

p = subprocess.Popen(shlex.split(command, posix=not(is_windows())), shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE, preexec_fn=group_subprocesses)

All ports screenshots

Is it possible to screenshot an URL is its ALL OPEN PORTS? If so, what's the command? If not...is it possible to implement it?

thx

[ERROR][https://www.gpw.pl:443/session-details] Shell command PID 5346 returned an abnormal error code: '1'

[ERROR][https://www.gpw.pl:443/session-details] Shell command PID 5346 returned an abnormal error code: '1'
[ERROR][https://www.gpw.pl:443/session-details] Screenshot somehow failed
This seems to be working for few URL's but for most of the others it does not seems to be working.

Checked that Phantomjs is latest
[root@XXX ~]# phantomjs --version 2.1.1

[root@XXX ~]# phantomjs
phantomjs> phantom.version
{
"major": 2,
"minor": 1,
"patch": 1
}

Getting this error whenever i run the script can't seems to find a solution ...

Memory Exhaustion possibly due to unfinished PhantomJS Processes

While running webscreenshot on a Digital Ocean Ubuntu Box with 2 GB of RAM on a list of approximately 7500 valid HTTP servers, the tool starts working correctly with an overall 250-300 MB of memory compsumtion on the whole OS.

As the tool continues working, that memory compsumption increases over time until there isn't any more available to use and the tool starts sending errors. I took a screenshot of htop, which I'm attaching, during execution while everything still works okay and I notice docens of phantomjs processes, although I set the tool to run just 1 worker. If I kill all these processes everything goes back to normal.

The command used was the following:

webscreenshot --no-xserver -r phantomjs -w 1 --window-size 640,360 -o output_dir -i servers_file

That was ran inside a screen, although I don't think that would make a difference.

The could be an issue in the process of finishing phantomjs processes. It would be great if this could be reviewed.

Thanks in advance,

Htop screenshot:

Screen Shot 2019-10-31 at 14 12 33

custom filename with timestamp suffix

Hi, this is a great tool.
I'd really like to be able to change the output filename for each png saved. Ideally, there would be a command line parameter that would simply append a datestamp as a string to the existing filename, which is the url. Is this something that is possible through the command line, or would I have to fork the repo and adjust this line to add an option to append it?
You mention there are some length constraints even below 255 for certain renderers. Are those constraints documented somewhere? What ballpark should be avoided?

Screenshot went some wrong

Hi,
I like your tool.
But I have an error on some pages, like this:
[ERROR][https://......] Screenshot somehow failed

I think it is a problem because of an embedded youtube video

why the screenshot folder is empty

I have installed this tool successfully. I tried to python webscreenshot.py -i test.txt , but the screenshot directory is empty. why?

Best quality possible?

First I have to say that it's a great tool. I am so happy that I found it!

Now I need it for a screenshot where I use the cv2 library on it. And the quality leads to about 10% errors.
Here you can find a part of it.

My settings are

quality=100
window_size=2400,1000
format=jpg
...

Of course, I tried around with the numbers. But with highest quality the window size does not change a lot.

Any ideas or hints?
Thanks!

Obtain output file name

Hello,

How can I obtain the output file name after a successful screenshot while calling from inside of another Python script?

Replace phantomjs by headless Google Chrome

Hi !

phamtomjs' only maintainer has given up on maintaining his software, due to the release of Google Chrome headless browser in version 59

Have you considered to migrate webscreenshot's backend accordingly ?

Thanks in advance !

😛

Shell command PID xxxx returned abnormal error code '-6'.

Hello, having some issues getting this to run. Just for a test here is what is happening:

list.txt
google.com
twitter.com

command being issued:
python webscreenshot.py -i testlist.txt -vv

Output:

python webscreenshot.py -i testlist.txt -vv

webscreenshot.py version 1.8

[DEBUG][General] Options: {'log_level': 'DEBUG', 'http_username': None, 'input_file': 'testlist.txt', 'workers': 2, 'output_directory': None, 'header': None, 'verbosit
y': 2, 'cookie': None, 'proxy': None, 'timeout': 30, 'proxy_auth': None, 'http_password': None, 'ssl': False, 'port': None}

[INFO][General] 'google.com' has been formatted as 'http://google.com:80' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://google.com:80] Shell command to be executed
'phantomjs --ignore-ssl-errors true --ssl-protocol any "/home/osintscreenshot/webscreenshot/original/webscreenshot.js" url_capture="http://google.com:80" output_file="
/home/osintscreenshot/webscreenshot/original/screenshots/http_google.com_80.png"'

[ERROR][http://google.com:80] Shell command PID 29124 returned an abnormal error code: '-6'
[ERROR][http://google.com:80] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
http://google.com:80

Any ideas?

renderer binary could not have been found in your current PATH environment variable.

when try to run webscreenshot after installing requirements,
it give me this error:
webscreenshot.py version 2.4

[+] 4 URLs to be screenshot
[ERROR][https://173.194.67.113:443] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][http://173.194.67.113:80] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][https://duckduckgo.com:443/robots.txt] renderer binary could not have been found in your current PATH environment variable, exiting
[ERROR][http://google.fr:80] renderer binary could not have been found in your current PATH environment variable, exiting
[+] 4 actual URLs screenshot
[+] 0 error(s)

3D maps from maps.google.com do not work

python webscreenshot.py -r chromium 'https://www.google.com.ph/maps/@55.6652884,12.4946009,89a,35y,254.22h,47.58t/data=!3m1!1e3?hl=en' gives the loading image and not the full 3D map as you get in a browser.

AttributeError: 'Namespace' object has no attribute 'format'

This is the error code (line 361, in the craft_cmd function):

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/XUSERX/.local/lib/python3.6/site-packages/webscreenshot/webscreenshot.py", line 361, in craft_cmd
    output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'

Would be great if I could get it run!
Thanks for help.

Redirect Support / Nodejs Webpages / Additional Logging?

Does webscreenshot support redirects when taking screenshots?

For example, this URL generates a blank screenshot with chrome as the web driver and webscreenshot version 2.4:

https://mohaaaa.co.uk

The below URL is for a node.js powered site:

https://videos.dinofly.com

When it is screenshot, it doesn't appear to be loading all of the JavaScript / finishing the rendering as it would appear to an end user? Any idea on this one?

Also, rather than outputting messages such as Screenshot somehow failed, is there a way to find out why it somehow failed? I think it would be nice to have more info if possible.

Using tool with external python script on a Jupyter Notebook

Good day, I have installed webscreenshot.
I tried running it in python, but i came into this error

[+] 2 URLs to be screenshot

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\z0014071\webscreenshot.py", line 361, in craft_cmd
    output_format = options.format if options.renderer == 'phantomjs' else 'png'
AttributeError: 'Namespace' object has no attribute 'format'
"""

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
<ipython-input-18-74b6b93cdc5a> in <module>()
      9 
     10 # actually launching the function
---> 11 take_screenshot(url_list, options)

~\webscreenshot.py in take_screenshot(url_list, options)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

~\webscreenshot.py in <listcomp>(.0)
    437     pool = multiprocessing.Pool(processes=int(options.workers), initializer=init_worker)
    438 
--> 439     taken_screenshots = [r for r in pool.imap(func=craft_cmd, iterable=izip(url_list, itertools.repeat(options)))]
    440 
    441     screenshots_error_url = [url for retval, url in taken_screenshots if retval == SHELL_EXECUTION_ERROR]

C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py in next(self, timeout)
    746         if success:
    747             return value
--> 748         raise value
    749 
    750     __next__ = next                    # XXX

AttributeError: 'Namespace' object has no attribute 'format'

here is my Code

import argparse
from webscreenshot import *

# url list to screenshot
url_list = ['http://google.de', 'http://google.com']

# defining options manually
options = argparse.Namespace(URL=None, cookie=None, header=None, http_password=None, http_username=None, input_file=None, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/tmp/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

# actually launching the function
take_screenshot(url_list, options)

taken from
#19 (comment)

How to use webscreenshot from inside a python script?

The documentation states:

pip install webscreenshot and then directly use webscreenshot

How does one directly use webscreenshot?

My python script contains:

import webscreenshot

Now, how do I call webscreenshot directly from the script? The documentation doesn't provide any examples. It does for calling the script from the commandline and passing arguments, but I want to call it directly from inside my python script.

webscreenshot.take_screenshot(list_of_urls) doesn't seem to work.

Screenshot not being saved

Dear Maaaaz

First of all thank you so much for such a great tool (eventhough I am not able to use it correctly). I am running latest raspian os (32bit) on my Raspberry Pi 4 and everything with installation worked out pretty darn well. However, if I run the script as intended, it does not save the screenshot nor does it return any errors.

All the folders in which I am working have permission 777, including the screenshots folder which was sucessfully created by your program. I have tried to force ssl and tried different websites but none of them works. I used to use the renderer binary from chromium but as I saw no results switched to PhantomJS, but still nothing.

Here is the debug info (though I don't know if this is worth anything for you):

root@**hostname**:/somepath/# python webscreenshot -vv --renderer-binary /opt/phantomjs-2.1.1-linux-i686/bin/phantomjs 192.168.0.220
webscreenshot.py version 2.91

[DEBUG][General] Options: Namespace(URL='192.168.0.220', ajax_max_timeouts='1400,1800', cookie=None, crop=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file=None, label=False, label_bg_color='NavajoWhite', label_size=60, log_level='DEBUG', multiprotocol=False, no_xserver=False, output_directory='/somepath/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary='/opt/phantomjs-2.1.1-linux-i686/bin/phantomjs', ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

[INFO][General] '192.168.0.220' has been formatted as 'http://192.168.0.220:80' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://192.168.0.220:80] Shell command to be executed
'/opt/phantomjs-2.1.1-linux-i686/bin/phantomjs --ignore-ssl-errors=true --ssl-protocol=any --ssl-ciphers=ALL "/usr/local/lib/python3.7/dist-packages/webscreenshot/webscreenshot.js" url_capture=http://192.168.0.220:80 output_file="/somepath/screenshots/http_192.168.0.220_80.png" width=1200 height=800 format=png quality=75 ajaxtimeout=1400 maxtimeout=1800'

[+] 1 actual URLs screenshot
[+] 0 error(s)

As you can see I am running latest version of everything so and system is up to date :) Any ideas?

Best regards
Marco Leder

Transparent background

When dealing with websites with transparent background, it's hard to take a look at the captures.

Could you save the captures as jpg files or force a blank background ?

Thx.

Screenshot somehow failed on some websites / Shell command abnormal code

Hey!
I am trying to use your program to download ~2k screenshots, running it for ~150 links at a time.
For some websites it just doesn't work, saying both "Screenshot somehow failed" and "Shell command PID ---- returned abnormal error code '1' ". I ran it with xvfb, without xvfb, with -vv, without -vv.
The websites that don't work aren't consistent; if I run it now, I might get 5/31 (list with links that fail), if I run it again, I might get 4 or 6. Also, running it with -m only helps with some of the websites, still remaining a handful of them that don't work on either :443 and :80.
I have the latest PhantomJS version, I have installed webscreenshot from pip and I also cloned the repo, ran it both ways a number of times.
The network I'm connected to does not have any restrictions.
The webiste list is composed of the few thousand beggining link from Alexa top 1mil.

Domain does not gets formatted automatically due to which command fails

I tried to execute it from within a python code but it kept failing. I tried took the DEBUG output and matched it wih when the command is run directly from shell.
I tcame out to be that when I provide a list of domains from within another python code then the domains aren't being converted from google.com to http://google.com:80 due to which the command fails.

Screenshot when ran inside python
Screenshot from 2020-12-15 19-23-54

Screenshot when ran directly
Screenshot from 2020-12-15 19-24-52

Issue with PATH env

Got some issue with the PATH env.
I'm trying to make it work under Win7.

set PHANTOMJS_BIN="C:\Program Files\phantomjs"

[WinError 2] The system cannot find the file specified
[ERROR][http://abc.xyz:80] renderer binary could not have been found in your current PATH environment variable, exiting

Tried with explicitly adding -r phantomjs, or chrome, chromium, none of them work.

Encoding issues

Apparently the script doesn't work unless the input file is encoded in "UTF-8 without BOM".

Window Size Configuration (either default or not) Doesn't Seem to Work

Hello maaaaz, and thanks again for this great tool and the support. Today I am reporting an issue that seems pretty obvious to me, so I don't know if it happens under certain conditions in my environments, although I tested it both on Linux and Mac. Let me know if I can provide more data than the following:

  • Issue: Output screenshots have the content of the whole page, resulting in really long images, even though the default length should be 800 px and also when using the --window-size argument.
  • Tested Version: webscreenshot 2.8 (but it happened with older versions too)
  • Tested OS: Linux and Mac
  • Reproduction steps on Linux: Run the following command and check the output image has more than 360 pixels on length.
python3 webscreenshot.py --no-xserver -r phantomjs -w 1 --window-size 640,360 -o ~ https://www.apple.com/mx/mac/

`**Reproduction steps on Mac ** Run the following command and check the output image has more than 360 pixels on length.

python3 webscreenshot.py --no-xserver -r phantomjs -w 1 --window-size 640,360 -o ~ https://www.apple.com/mx/mac/

Output Image:

https_www apple com_443_mx_mac_

error code: 1

I have done the solution given by @putsi but it didn't worked. It is still giving me the same error
code : xvfb-run python webscreenshot.py -i file.txt -o ~/dir/dir/result.txt -w 20 -a "X-FORWARDED-FOR: 127.0.0.1"

so what should i do now?

Python 2.7/3.7

Is this program for python 3.7? When I try to run the program it says "Missing parentheses in call to 'print'. Did you mean print("[+] %s URLs to be screenshot" % screenshot_number)?". So, I then assumed it was Python 2.7. Would you like me to add the parenthesis? or maybe you could add that its only for python 2.7

Ajax actions before taking the snapshot

Would it be possible to perform an action before taking the snapshots?
For e.g. clicking a button inside the webpage to show some hidden information

" def lookup(driver):
driver.get("url")
try:
#box = driver.wait.until(EC.presence_of_element_located((By.NAME, "q")))

    #button = driver.find_element_by_class_name('oc-home-search__button')
    button = driver.find_element_by_id('showPhone')
    #box.send_keys(query)
    button.click()

    image = driver.find_element_by_class_name('telnumber')
    print(image.get_attribute('element'))
    abc = driver.FindElement(By.XPath("//div[contains(@style, 'background-image'))
    #print(image.get_attribute("url"))

except TimeoutException:
    print("Box or Button not found!")

Crop/resize image?

The screenshots are always the full height of the webpage.

/usr/bin/env python3 webscreenshot.py --no-xserver -f jpg -q 80 --window-size 1600,768 blogger.com
I was wondering if it is possible to define the viewport?

Shell command PID XXXXX returned an abnormal error code: '-11'

Hi,
I'm getting this error on this specific url It does create the screenshot ok but the error appears.
Screenshots on other links in the same domain seem to work ok.

urls.txt:
http://www.milenio.com/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza

Command used:
webscreenshot -vv -i urls.txt -o news_extractor/static/screenshots --ajax-max-timeouts 2500,2500

Output:

webscreenshot.py version 2.94

[DEBUG][General] Options: Namespace(URL=None, ajax_max_timeouts='2500,2500', cookie=None, crop=None, custom_js=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file='urls.txt', label=False, label_bg_color='NavajoWhite', label_size=60, log_level='DEBUG', multiprotocol=False, no_error_file=False, no_xserver=False, output_directory='/Users/aldricmp1/PycharmProjects/newsextractorapp/news_extractor/static/screenshots', port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary=None, single_output_file=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4)

[INFO][General] 'http://www.milenio.com/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza' has been formatted as 'http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza' with supplied overriding options
[+] 1 URLs to be screenshot
[DEBUG][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Shell command to be executed
'phantomjs --ignore-ssl-errors=true --ssl-protocol=any --ssl-ciphers=ALL "/Users/aldricmp1/opt/anaconda3/envs/newsextractorapp/lib/python3.6/site-packages/webscreenshot/webscreenshot.js" url_capture=http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza output_file="/Users/aldricmp1/PycharmProjects/newsextractorapp/news_extractor/static/screenshots/http_www.milenio.com_80_politica_lopez-gatell-habra-hospitales-llenos-por-epoca-influenza.png" width=1200 height=800 format=png quality=75 ajaxtimeout=2500 maxtimeout=2500'

[ERROR][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Shell command PID 91585 returned an abnormal error code: '-11'
[ERROR][http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
http://www.milenio.com:80/politica/lopez-gatell-habra-hospitales-llenos-por-epoca-influenza

Shell command PID xxxx returned an abnormal error code: '4294967295'

Hi

I got a Error message.

How solve?

My environment:

Windows10 64bit
python 3.8
phantomjs 2.5.0-development
webscreenshot 2.9

C:\Program Files\Python38\Lib\site-packages\webscreenshot> python .\webscreenshot.py http://www.google.co.kr -r phantomjs --renderer-binary "C:\Program Files\phantomjs-2.5.0-beta-windows\bin\phantomjs.exe"

webscreenshot.py version 2.9

[+] 1 URLs to be screenshot
[ERROR][http://www.google.co.kr:80] Shell command PID 5012 returned an abnormal error code: '4294967295'
[ERROR][http://www.google.co.kr:80] Screenshot somehow failed

[+] 0 actual URLs screenshot
[+] 1 error(s)
    http://www.google.co.kr:80

Issue in URL Pre-Formatter prefixes https protocol when port is 443 although http is provided

When webscreenshot is supplied a list of URLs and one of them is http://domain.com:443, notice the http protocol instead of https, the formatter changes the http protocol to https.

Example (the real domain was changed to domain.com):

webscreenshot -vv http://domain.com:443
[INFO][General] 'http://domain.com:443' has been formatted as 'https://domain.com:443' with supplied overriding options

I understand that 443 is the port typically used for https, but it is also possible to run http, or even any other protocol on 443, and I've seen cases doing recon of this happening, http (not https) on 443.

When someone supplies a list already providing the protocol and the port, I believe it doesn't make sense to run the pre formatter.

workaround for 403 or Forbidden

https://stackoverflow.com/questions/59954122/phantomjs-page-render-blank-white-background-image-on-403

  • page.render doesn't seem to work if page returns 403 status or Forbidden.
  • Do you have any workaround in mind to get the actual screenshot of page instead of transparent or plain background image.

Webscreenshot.py making a weird screenshot (used within different python script)

Hi currently im using the take_screenshot function from another python script. Using python 3.6 on Ubuntu

options = argparse.Namespace(URL=None, ajax_max_timeouts='1400,1800', cookie=None, crop=None, format='png', header=None, http_password=None, http_username=None, imagemagick_binary=None, input_file=None, label=False, log_level='DEBUG', multiprotocol=False, no_xserver=True, output_directory=path, port=None, proxy=None, proxy_auth=None, proxy_type=None, quality=75, renderer='phantomjs', renderer_binary=None, ssl=False, timeout=30, verbosity=2, window_size='1200,800', workers=4) -> take_screenshot(url_container, options)

the url container just contains a bunch of urls.

The output which gets generated on a specific page which I want to monitor looks like this
balken

but it should look more like this
balken 2

why does this happen?

No option to wait until page is loaded

I am trying to screenshot a webpage that takes a few moments to fully load, and the script is simply running and screenshotting before the page can load. Is there an option to set a sleep period? Or could one be added?

Cookie parameter ignored when using chrome as engine

Using this command does not work, as the option -c seems to be ignored:

webscreenshot -v -r chromium --no-xserver -c COOKIE "JSESSIONID=1234; YOLO=SWAG" http://grim.at/tmp/cookie.php

Looking in the source code it seems as if the option is only present using phantomjs - can you please fix this?

Thanks in advance!
Wolfgang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.