rajatomar788 / pywebcopy Goto Github PK

Locally saves webpages to your hard disk with images, css, js & links as is.

Home Page: https://rajatomar788.github.io/pywebcopy/

License: Other

Python 100.00%

webpage python html html-parser mirror archive-tool web crawler

pywebcopy's Introduction

    ____       _       __     __    ______                     _____   
   / __ \__  _| |     / /__  / /_  / ____/___  ____  __  __   /__  /   
  / /_/ / / / / | /| / / _ \/ __ \/ /   / __ \/ __ \/ / / /     / /    
 / ____/ /_/ /| |/ |/ /  __/ /_/ / /___/ /_/ / /_/ / /_/ /     / /     
/_/    \__, / |__/|__/\___/_.___/\____/\____/ .___/\__, /     /_/      
      /____/                               /_/    /____/

Created By : Raja Tomar License : Apache License 2.0 Email: [email protected]

PyWebCopy is a free tool for copying full or partial websites locally onto your hard-disk for offline viewing.

PyWebCopy will scan the specified website and download its content onto your hard-disk. Links to resources such as style-sheets, images, and other pages in the website will automatically be remapped to match the local path. Using its extensive configuration you can define which parts of a website will be copied and how.

What can PyWebCopy do?

PyWebCopy will examine the HTML mark-up of a website and attempt to discover all linked resources such as other pages, images, videos, file downloads - anything and everything. It will download all of theses resources, and continue to search for more. In this manner, WebCopy can "crawl" an entire website and download everything it sees in an effort to create a reasonable facsimile of the source website.

What can PyWebCopy not do?

PyWebCopy does not include a virtual DOM or any form of JavaScript parsing. If a website makes heavy use of JavaScript to operate, it is unlikely PyWebCopy will be able to make a true copy if it is unable to discover all of the website due to JavaScript being used to dynamically generate links.

PyWebCopy does not download the raw source code of a web site, it can only download what the HTTP server returns. While it will do its best to create an offline copy of a website, advanced data driven websites may not work as expected once they have been copied.

Installation

pywebcopy is available on PyPi and is easily installable using pip

$ pip install pywebcopy

You are ready to go. Read the tutorials below to get started.

First steps

You should always check if the latest pywebcopy is installed successfully.

>>> import pywebcopy
>>> pywebcopy.__version___
7.x.x

Your version may be different, now you can continue the tutorial.

Basic Usages

To save any single page, just type in python console

from pywebcopy import save_webpage
save_webpage(
      url="https://httpbin.org/",
      project_folder="E://savedpages//",
      project_name="my_site",
      bypass_robots=True,
      debug=True,
      open_in_browser=True,
      delay=None,
      threaded=False,
)

To save full website (This could overload the target server, So, be careful)

from pywebcopy import save_website
save_website(
url="https://httpbin.org/",
project_folder="E://savedpages//",
project_name="my_site",
bypass_robots=True,
debug=True,
open_in_browser=True,
delay=None,
threaded=False,
)

Running Tests

Running tests is simple and doesn't require any external library. Just run this command from root directory of pywebcopy package.

$ python -m pywebcopy --tests

Command Line Interface

pywebcopy have a very easy to use command-line interface which can help you do task without having to worrying about the inner long way.

Getting list of commands
```
$ python -m pywebcopy --help
```

Using CLI

Usage: pywebcopy [-p|--page|-s|--site|-t|--tests] [--url=URL [,--location=LOCATION [,--name=NAME [,--pop [,--bypass_robots [,--quite [,--delay=DELAY]]]]]]]

Python library to clone/archive pages or sites from the Internet.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --url=URL             url of the entry point to be retrieved.
  --location=LOCATION   Location where files are to be stored.
  -n NAME, --name=NAME  Project name of this run.
  -d DELAY, --delay=DELAY
                        Delay between consecutive requests to the server.
  --bypass_robots       Bypass the robots.txt restrictions.
  --threaded            Use threads for faster downloading.
  -q, --quite           Suppress the logging from this library.
  --pop                 open the html page in default browser window after
                        finishing the task.

  CLI Actions List:
    Primary actions available through cli.

    -p, --page          Quickly saves a single page.
    -s, --site          Saves the complete site.
    -t, --tests         Runs tests for this library.

Running tests
```
  $ python -m pywebcopy run_tests
```

Authentication and Cookies

Most of the time authentication is needed to access a certain page. Its real easy to authenticate with pywebcopy because it uses an requests.Session object for base http activity which can be accessed through WebPage.session attribute. And as you know there are ton of tutorials on setting up authentication with requests.Session.

Here is an example to fill forms

from pywebcopy.configs import get_config

config = get_config('http://httpbin.org/')
wp = config.create_page()
wp.get(config['project_url'])
form = wp.get_forms()[0]
form.inputs['email'].value = 'bar' # etc
form.inputs['password'].value = 'baz' # etc
wp.submit_form(form)
wp.get_links()

You can read more in the github repositories docs folder.

pywebcopy's People

Contributors

Stargazers

Watchers

Forkers

pockyzhang maxpearl dututhien benjamesbabala niejn amaurysoares ali1rathore akamit kkkelvinkk perryyo asgrom hacky1997 atopshih shishaktkumarcls nikitautiu ajieputra40 kmossy marcelofullstack hamik112 kennym adel-b gingeleski deleuzer kurhula kuceramartin needcrawler alessandrohc e-harvester cl545740896 przor3n at911 wasim961 strogo michaelmahony luisdomin5 ksmaheshkumar stjordanis rsteca creptt wcassw altmann144 gravelcycles khanfarhan10 yport8 nickveld djd0723 reidm goujibing123 wikityrey 0x163ml stan-c johanjohan lysouler openmicrostacks classicvalues sbrun yesuki zhongmengfeng serbathome wadelu goujibing mikite shop-conan gallavee battyone zerihun-cs fenick2 ob404 automata-studio junjie2008v monim67 shengdinghu delort fernandabackenddeveloper cosrah eoghan anesh aodmrz 0xd22 iq-scm jeffycf qboy0000 nanotrash totoro2205 misuaz likescam axispdf drumskills xiaoleng-9603 jorik041 horysk unbackedmuffin aihxdev assnow xtracool daruma-ant kagrawal2 supriya-eximietas cyberorca weholt

pywebcopy's Issues

ImportError: cannot import name 'findall' from 'parse'

The issue seems to be here:

File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pywebcopy\parsers.py", line 22, in <module>
    from parse import findall, search as parse_search

How to fill a form login

Hi Rajatomar,

Would you please indicate how to login a website OR what class or function should we amend to implement this feature

pywebcopy/configs.py, setup_paths method changes the working directory

Hello,
I encountered an issue with the scrapping tool.
There is a command -> os.chdir(norm_p) in the setup_paths method that changes the working directory.
The issue appears when I send several scraping requests to the same config. It tried to write file to recursive folders.
For example:
If I'm trying scrap files to output directory with folder name 'test'
So, the second request will try to scrap to /output/test/output/test/..
To fix it I wrote the following lines:
for n in range(3):
os.chdir("..")

Please advise

setup_config() got an unexpected keyword argument 'url'

Python version: 3.7.7

I'm using the same code as in the README, but I'm getting the following error when running it:

Traceback (most recent call last):
  File "main.py", line 14, in <module>
    pywebcopy.config.setup_config(**kwargs)
TypeError: setup_config() got an unexpected keyword argument 'url'

This is the code:

import pywebcopy

# Rest of the code is as usual
kwargs = {
    'url': 'XXX',
    'project_folder': '/Users/kennymeyer/Projects/zyx',
    'project_name': 'zyx',
    'bypass_robots': True,
}
pywebcopy.config.setup_config(**kwargs)
pywebcopy.save_webpage(**kwargs)

What am I doing wrong?

CSS images are not downloaded

Hi cool project, works very well, i just noticed one small bug when it tries to download css linked files:

css:
background-image: url('../img/email.png');

Error when downloading the image:

(Pdb) req
<Response [403]>

(Pdb) req.url
"/img/email.png'"

(Pdb) bt
/pywebcopy/elements.py(341)run()
-> f.run()
/pywebcopy/elements.py(56)run()
-> self.download_file()

/pywebcopy/elements.py(110)download_file()
-> LOGGER.error(

saved / transformed css:
background-image: url(../img/35e3b271__email.png%27);

Seems the parsing of url() calls in css dont anticipate for '' strings?

cannot import name url2pathname

OS: Ubuntu 18.04.3 LTS
Python: Python 2.7.15+

I have installed pywebcopy via pip on a new install on ubuntu and I'm getting the following error:

~# python -m pywebcopy
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/usr/lib/python2.7/runpy.py", line 111, in _get_module_details
    __import__(mod_name)  # Do not catch exceptions initializing package
  File "/usr/local/lib/python2.7/dist-packages/pywebcopy/__init__.py", line 34, in <module>
    from .configs import config, SESSION
  File "/usr/local/lib/python2.7/dist-packages/pywebcopy/configs.py", line 16, in <module>
    from .compat import urlparse, urljoin
  File "/usr/local/lib/python2.7/dist-packages/pywebcopy/compat.py", line 30, in <module>
    from urlparse import urlparse, urlunparse, urljoin, urlsplit, urldefrag, url2pathname, pathname2url
ImportError: cannot import name url2pathname

I'm fairly certain that urlparse is a standard module, so I'm not totally sure why url2pathname cannot be be imported. Suggestions?

Documentation - How to - Save Single Webpage - Method 2 using Plain HTML

A typo I think.
The last line of code from the example is
wp.save_webpage()
but it should be
wp.save_complete()

TypeError: Unicode-objects must be encoded before hashing

This is the program on Windows 10:

from pywebcopy import save_webpage
save_webpage(
project_url='https://www.google.com',
project_folder='C:\temp'
)

This is the error:

INFO - pywebcopy.parsers.html:125 - Using default Codec on raw_html!
Traceback (most recent call last):
File "get_webcopy.py", line 12, in
project_folder='C:\temp'
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\workers.py", line 60, in save_webpage
WebPageParser(project_url, project_folder, project_name, encoding=encoding, HTML=html, **kwargs).save_complete()
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\parsers.py", line 594, in save_complete
self.save_assets(reset_html=False)
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\parsers.py", line 578, in save_assets
self._extractLinks()
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\parsers.py", line 510, in _extractLinks
self.url_handler.handle(*i)
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\parsers.py", line 466, in url_handler
self._url_handler = UrlHandler(self.url, self.url_obj.file_path)
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\parsers.py", line 457, in url_obj
self._url_obj = Url(self.url)
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\urls.py", line 96, in init
self.default_filename = self.hash() + (guess_extension(url) or '.download')
File "C:\Program Files\Python37\lib\site-packages\pywebcopy\urls.py", line 108, in hash
return str(int(hashlib.sha1(self.original_url).hexdigest(), 16) % (10 ** 8))
TypeError: Unicode-objects must be encoded before hashing

Any idea?

Overwrite only if file changed mode

Is it possible to only overwrite the file if the file changed since the last crawl?

BUG REPORT: Log file never flushes causing drive to run out of space

I have a tool that uses pywebcopy to save_webpage for a site with specific IP data. The problem is that I am using a loop in my tool to make repeated calls to pywebcopy with a different project_name each time BUT pywebcopy module continues to add to a persistent pywebcopy_log.log file with each call AND writes a copy of this increasing log file to the project_folder after each call to save_webpage.

EXPECTED BEHAVIOR: Log file flushed with each call to pywebcopy.save_webpage with a new project_name (and associated project_folder).

So if each call to pywebcopy.save_webpage() generated a 1MB log, the first call for the first project_name would write a 1MB file to project_folder_1, the second call to the second project_name would write a 2MB file to project_folder_2, the third call to the third project_name would write a 3MB file to project_folder_3, etc.

This becomes a major problem when you might loop through hundreds or thousands of project_names (my tool currently processes 1600 web pages - which is 1600 separate project_names - per batch). As you can probably imagine, this becomes very problematic when the log size of the first few projects is only a few kilobytes but the logs for later projects has grown to +25MB each.

This is definitely a bug and needs to be fixed in order to support pywebcopy logging with anything more than a single project_name.

SAMPLE OUTPUT SHOWING ERRORS AND DISK UTILIZATION:
Preserving 'DNS' from intodns.com for beetfarmprepared.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for cranberrypowers.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for lifesavertip.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for handraisedvote.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for glasscleardeals.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for tallfootball.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for notordered.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for tellingbanks.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for gigglesshared.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for maybeinevitable.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for gettingreadytoleave.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for gettingouthere.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for actuallytoogood.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for laughterjump.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for movinghopfence.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Preserving 'DNS' from intodns.com for stayrealpeace.com... ExceptionType: <class 'OSError'>, Exception: OSError(28, 'No space left on device') Traceback (most recent call last): File "chirp.py", line 494, in <module> main() File "chirp.py", line 487, in main Evidence(cidrfile, sleep) File "chirp.py", line 96, in __init__ self.write_json_to_file(self.dir, self.basename, self.records) File "chirp.py", line 474, in write_json_to_file with open(jpath, 'w') as f: OSError: [Errno 28] No space left on device: 'master.batches.20200307.2145/batch001.20200307.2151/batch001.json' Sun Mar 8 06:38:39 EDT 2020 (venv36) [root@oszlmu01 ~/cpt/chirp]# pwd /root/cpt/chirp You have new mail in /var/spool/mail/root (venv36) [root@oszlmu01 ~/cpt/chirp]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 909M 0 909M 0% /dev tmpfs 920M 4.0K 920M 1% /dev/shm tmpfs 920M 97M 823M 11% /run tmpfs 920M 0 920M 0% /sys/fs/cgroup /dev/sda5 9.8G 9.1G 782M 93% / /dev/sda3 2.0G 33M 2.0G 2% /tmp /dev/sda1 197M 177M 21M 90% /boot tmpfs 184M 0 184M 0% /run/user/0 (venv36) [root@oszlmu01 ~/cpt/chirp]#

EXAMPLE OF PROBLEM: (Note that size of pywebcopy_log.log continues to increase)
(venv36) [root@oszlmu01 ~/cpt/chirp]# find . -name "pywebcopy_log.log" -exec ls -lh {} \; -rw-r--r--. 1 root root 29M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/quipito.com/pywebcopy_log.log -rw-r--r--. 1 root root 29M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydecorism.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydesignish.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydesignize.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydesignery.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydesignism.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydetailish.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydetailize.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydetailery.com/pywebcopy_log.log -rw-r--r--. 1 root root 28M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nancydetailism.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadecorish.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadecorize.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadecorery.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadecorism.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadesignish.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadesignize.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadesignery.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadesignism.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadetailish.com/pywebcopy_log.log -rw-r--r--. 1 root root 27M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadetailize.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadetailery.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/normadetailism.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledecor.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledetail.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledecorish.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledecorize.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledecorery.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledecorism.com/pywebcopy_log.log -rw-r--r--. 1 root root 26M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledesignish.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nicoledesignize.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/uncoveredtaste.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/amusingflower.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/pourcontinue.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/agreementthumb.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/supportlumber.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/divisionbustling.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/marvelousbelief.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/stewjewel.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/willinginsect.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/memorisewrap.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/hystericalfuturistic.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/emptyannounce.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/expertelated.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/supremethoughtful.com/pywebcopy_log.log -rw-r--r--. 1 root root 25M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/workableruddy.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/bootdistance.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/plausiblenose.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/chillyvanish.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/finehesitant.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/broadhumorous.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/concentrateviolet.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/equablecagey.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/attractpossess.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/closeddescribe.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/gratistooth.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/squealingritzy.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lazyillustrious.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/answercoat.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/purringtoes.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdecor.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/washerze.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/washerism.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/washerish.com/pywebcopy_log.log -rw-r--r--. 1 root root 24M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdecorish.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdecorize.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdecorery.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdecorism.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdesignish.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdesignize.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverdesignism.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverlandscapeish.com/pywebcopy_log.log -rw-r--r--. 1 root root 23M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/riverlandscapeism.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedecorish.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedecorize.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedecorism.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedesignish.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedesignize.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedesignery.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakedesignism.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakelandscapeish.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakelandscapeize.com/pywebcopy_log.log -rw-r--r--. 1 root root 22M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lakelandscapeism.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/poddecorish.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/poddecorize.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/poddecorery.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/hutandhouse.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/ingloohomes.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/brinkhomesinwind.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/skilledruth.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/clumsylier.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/hslipper.com/pywebcopy_log.log -rw-r--r--. 1 root root 21M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/romanboat.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/uoutput.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/coldclipper.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/fuzzypruner.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/ibegonia.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/realferry.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/fbengal.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/urgentquartz.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/silkycard.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/modernmice.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/neatradar.com/pywebcopy_log.log -rw-r--r--. 1 root root 20M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/stormystarter.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nuttyseat.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/ldogsled.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/unawaregold.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/marriedcall.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/funnyelement.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/lbuffet.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/amateurlaundry.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/visualoval.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/oddchive.com/pywebcopy_log.log -rw-r--r--. 1 root root 19M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/puremodem.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/subtlepalm.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/purplegeorge.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/crudelute.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nmitten.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/honorabletelevision.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/fearlesscommercial.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/obviousregulation.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/polishedbank.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/acclaimedunion.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/remarkabletime.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/fluidaccounting.com/pywebcopy_log.log -rw-r--r--. 1 root root 18M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/nutritiouscosts.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/classicworkforce.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/chiefroom.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/experiencedplace.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/formalworkplace.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/substantialtelevision.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/coordinatedbreaks.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/calculatingage.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/judiciouscost.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/largemission.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/beneficialplace.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/astonishingmission.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/caringscheme.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/awarepackage.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/possiblepractices.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/imaginativehealth.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/attentiveemployee.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/profitablebroker.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/delectablemarket.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/valuablebalance.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/anothertax.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/delectabletime.com/pywebcopy_log.log -rw-r--r--. 1 root root 17M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/algorithmscoal.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/algorithmsingredient.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/algorithmsnod.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/archdistributor.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/archoverlay.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/clayboot.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/attorneycharacters.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/bailcharacters.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/bootpossibilities.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/chapteringredient.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/chapterpainter.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/chaptervictim.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/characterspossibilities.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/claydebris.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/clayseas.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/computationscoal.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/courtsseas.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/crossbutt.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/debriscomputations.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/debrispossibilities.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/debristears.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/debristrousers.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/debrisvolt.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/directionssecond.com/pywebcopy_log.log -rw-r--r--. 1 root root 16M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/directionsvictim.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/dolliesvolt.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/dwellingredient.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/dwellseas.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/elbowsarch.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/experiencedperformance.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/familiarweb.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/formalopportunity.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/forthrightadvertising.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/forthrightsector.com/pywebcopy_log.log -rw-r--r--. 1 root root 15M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/forthrightwebsite.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/fortunatestock.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/glitteringissue.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/gracefulexchange.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/graciouspayments.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/gratefulcosts.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/groundedbroker.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/groundedhuman.com/pywebcopy_log.log -rw-r--r--. 1 root root 14M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/harmlesstrademark.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/helpfulemployee.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/helpfulplanning.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/imaginativecosts.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/immaculateaccounting.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/immaculatetax.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/immediateagency.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/infatuatedsales.com/pywebcopy_log.log -rw-r--r--. 1 root root 13M Mar 8 06:38 ./master.batches.20200307.2145/batch001.20200307.2151/judiciousloan.com/pywebcopy_log.log

ADDITIONAL LOG DETAILS:
--- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit self.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush self.stream.flush() OSError: [Errno 28] No space left on device Call stack: File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/cpt/chirp/venv36/lib64/python3.6/site-packages/pywebcopy/elements.py", line 334, in run % (len(self._stack), self.file_path)) Message: '[2] CSS linked files are found in file [/root/cpt/chirp/master.batches.20200307.2145/batch001.20200307.2151/veteransfinancesolutions.com/intodns.com/static/style/9f1e1094__thickbox.css]' Arguments: () --- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit self.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush self.stream.flush() OSError: [Errno 28] No space left on device Call stack: File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/cpt/chirp/venv36/lib64/python3.6/site-packages/pywebcopy/elements.py", line 334, in run % (len(self._stack), self.file_path)) Message: '[2] CSS linked files are found in file [/root/cpt/chirp/master.batches.20200307.2145/batch001.20200307.2151/veteransfinancesolutions.com/intodns.com/static/style/9f1e1094__thickbox.css]' Arguments: () --- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit self.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush self.stream.flush() OSError: [Errno 28] No space left on device Call stack: File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/cpt/chirp/venv36/lib64/python3.6/site-packages/pywebcopy/elements.py", line 334, in run % (len(self._stack), self.file_path)) Message: '[2] CSS linked files are found in file [/root/cpt/chirp/master.batches.20200307.2145/batch001.20200307.2151/veteransfinancesolutions.com/intodns.com/static/style/9f1e1094__thickbox.css]' Arguments: () --- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit self.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush self.stream.flush() OSError: [Errno 28] No space left on device Call stack: File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/cpt/chirp/venv36/lib64/python3.6/site-packages/pywebcopy/elements.py", line 334, in run % (len(self._stack), self.file_path)) Message: '[2] CSS linked files are found in file [/root/cpt/chirp/master.batches.20200307.2145/batch001.20200307.2151/veteransfinancesolutions.com/intodns.com/static/style/9f1e1094__thickbox.css]' Arguments: () --- Logging error --- Traceback (most recent call last): File "/usr/lib64/python3.6/logging/__init__.py", line 998, in emit self.flush() File "/usr/lib64/python3.6/logging/__init__.py", line 978, in flush self.stream.flush() OSError: [Errno 28] No space left on device Call stack: File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib64/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/root/cpt/chirp/venv36/lib64/python3.6/site-packages/pywebcopy/elements.py", line 334, in run % (len(self._stack), self.file_path)) Message: '[2] CSS linked files are found in file [/root/cpt/chirp/master.batches.20200307.2145/batch001.20200307.2151/veteransfinancesolutions.com/intodns.com/static/style/9f1e1094__thickbox.css]' Arguments: ()

Python 2 UserDict

For python 2 UserDict is in its own module UserDict.UserDict so an error occurs.
`

import pywebcopy
Traceback (most recent call last):
File "", line 1, in
File "C:\Python-2.7.11_64\lib\site-packages\pywebcopy_init_.py", line 49, in
from .configs import config, SESSION
File "C:\Python-2.7.11_64\lib\site-packages\pywebcopy\configs.py", line 13, in
from collections import UserDict
ImportError: cannot import name UserDict
`

Just wanted to give a heads up in case you wanted to restrict the lib to py3 or update for py2 support.
Thanks!

Relative links

I'm trying to download a webpage where the links are relative (for example an href may point to main/ where it's actually [site being crawled]/folder/main but we lose the /folder bit. Is there a way to append the parent URL?

Cannot import name 'save_webpage' from 'pywebcopy'

I'm trying to replicate the basic example and running into an error.

# pywebcopy.py
from pywebcopy import save_webpage

kwargs = {'project_name': 'some-fancy-name'}

save_webpage(
    url='http://example-site.com/index.html',
    project_folder='path/to/downloads',
    **kwargs
)

Running $ python pywebcopy.py gives me:

Traceback (most recent call last):
  File "pywebcopy.py", line 1, in <module>
    from pywebcopy import save_webpage
  File "/pywebcopy.py", line 1, in <module>
    from pywebcopy import save_webpage
ImportError: cannot import name 'save_webpage' from 'pywebcopy' (/pywebcopy.py)

I already installed pywebcopy using $ pip install pywebcopy

Please advise.

ImportError: cannot import name 'save_website' from 'pywebcopy'

Just running the example from the README throws an error

ImportError: cannot import name 'save_website' from 'pywebcopy'

issue

Exception has occurred: pywebcopy.exceptions.AccessError
Access to https://www.odoo.com/forum/help-1/question/odoo-9-show-user-popups-95514 not allowed by site.

this exception has occurred when i am trying to run following script

from pywebcopy.core import save_webpage

save_webpage(url='https://www.odoo.com/forum/help-1/question/odoo-9-show-user-popups-95514',mirrors_dir='C:/')

Scraping process stucks

pywebcopy.configs - INFO elements - INFO elements - INFO elements - INFO elements - INFO pywebcopy.configs - INFO pywebcopy.configs - INFO elements - INFO elements - INFO elements - INFO elements - INFO elements - INFO elements - INFO pywebcopy.configs - INFO elements - INFO elements - INFO elements - INFO root - INFO root - INFO certificate root - INFO root - INFO pywebcopy.configs - INFO pywebcopy.configs - INFO webpage - INFO parsers - INFO webpage - INFO webpage - INFO webpage - INFO webpage - Got response 200 from http://fonts.gstatic.com/s/roboto/v15/NdF9MtnOpLzo-noMoG0miPesZW2xOQ-xsNqO47m55DA.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/7e4c377b__u0TOpm082MNkS5K0Q4rhqvesZW2xOQ-xsNqO47m55DA.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/0e250b22__PwZc-YbIL414wB9rB1IAPRJtnKITppOI_IvcXXDNrsc.woff2
- [0] CSS linked files are found in file [/home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/cc07cc68__NdF9MtnOpLzo-noMoG0miPesZW2xOQ-xsNqO47m55DA.woff2]
- Writing file at location /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/cc07cc68__NdF9MtnOpLzo-noMoG0miPesZW2xOQ-xsNqO47m55DA.woff2
- Got response 200 from http://fonts.gstatic.com/s/roboto/v15/gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- Got response 200 from http://fonts.gstatic.com/s/roboto/v15/u0TOpm082MNkS5K0Q4rhqvesZW2xOQ-xsNqO47m55DA.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/cc07cc68__NdF9MtnOpLzo-noMoG0miPesZW2xOQ-xsNqO47m55DA.woff2
- Writing file at location /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/bdef5120__gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- [0] CSS linked files are found in file [/home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/7e4c377b__u0TOpm082MNkS5K0Q4rhqvesZW2xOQ-xsNqO47m55DA.woff2]
- Writing file at location /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/7e4c377b__u0TOpm082MNkS5K0Q4rhqvesZW2xOQ-xsNqO47m55DA.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/bdef5120__gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/7e4c377b__u0TOpm082MNkS5K0Q4rhqvesZW2xOQ-xsNqO47m55DA.woff2
- Got response 200 from http://fonts.gstatic.com/s/roboto/v15/gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- [0] CSS linked files are found in file [/home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/bdef5120__gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2]
- Writing file at location /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/bdef5120__gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- File of type .woff2 written successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/fonts.gstatic.com/s/roboto/v15/bdef5120__gwVJDERN2Amz39wrSoZ7FxTbgVql8nDJpwnrE27mub0.woff2
- Processing http://www.remixpr.in/Folder/index.php for certificate X.509 retrieval
- Fetching certificates from http://www.remixpr.in/Folder/index.php ended.
one line - MIIFpzCCBI+gAwIBAgISBFpnn7qN+tzenpqVs++XHHJpMA0GCSqGSIb3DQEBCwUAMEoxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MSMwIQYDVQQDExpMZXQncyBFbmNyeXB0IEF1dGhvcml0eSBYMzAeFw0xOTA5MjIyMTI3NDdaFw0xOTEyMjEyMTI3NDdaMBUxEzARBgNVBAMTCnJlbWl4cHIuaW4wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDL02a0T2ikBCOmO28cOgcw8HO7WGJAogcB4xW9PWleU/SlTbm7/nB1V7pL8BVdB+RJAgjNw81973s7mFx1UsULM+iaP+TwzoXAWNSW0uxCwg8/Psqz9oqw2DA5vIpwGM07CMTZ2LVupgu/HSL7FrSsRPOAr37XPer5zOmoCcg1V0eg7D3ild8xFY2XITn9ZIBr5uhTipnRJE5jkBBdvx3aAYOs4mdSToKfDPVauisSw44c3ngYYekx0kLN4NZBF2A7RYGTugZy6Cjz6eumxExKSphCOkPMpR8wGSvd+NtVAIwhL49V5P5XpTBbmpPUMAF63OujHPL5QIp01Vh6x3rDAgMBAAGjggK6MIICtjAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMCMAwGA1UdEwEB/wQCMAAwHQYDVR0OBBYEFGkGo178Qrb8WXpvhl8WVkg/tvH/MB8GA1UdIwQYMBaAFKhKamMEfd265tE5t6ZFZe/zqOyhMG8GCCsGAQUFBwEBBGMwYTAuBggrBgEFBQcwAYYiaHR0cDovL29jc3AuaW50LXgzLmxldHNlbmNyeXB0Lm9yZzAvBggrBgEFBQcwAoYjaHR0cDovL2NlcnQuaW50LXgzLmxldHNlbmNyeXB0Lm9yZy8wcQYDVR0RBGowaIIRY3BhbmVsLnJlbWl4cHIuaW6CD21haWwucmVtaXhwci5pboIKcmVtaXhwci5pboISd2ViZGlzay5yZW1peHByLmlughJ3ZWJtYWlsLnJlbWl4cHIuaW6CDnd3dy5yZW1peHByLmluMEwGA1UdIARFMEMwCAYGZ4EMAQIBMDcGCysGAQQBgt8TAQEBMCgwJgYIKwYBBQUHAgEWGmh0dHA6Ly9jcHMubGV0c2VuY3J5cHQub3JnMIIBAwYKKwYBBAHWeQIEAgSB9ASB8QDvAHUAKTxRllTIOWW6qlD8WAfUt2+/WHopctykwwz05UVH9HgAAAFtWxaNbQAABAMARjBEAiBpPidecYxpa8eG2BHncbS1y+dQ0EhqL1obDRhWvZWRvAIgDVxJid4sS4zBmingqLtsqDe43IsaKade0py8jjK0OjQAdgBvU3asMfAxGdiZAKRRFf93FRwR2QLBACkGjbIImjfZEwAAAW1bFo25AAAEAwBHMEUCIQCsJKuz2iJpwYQgoiHLB0KenepU1ce7hfgbabPjv5wtMgIgd16nc5T1eZUrhaVYWVyOMCUvx+q7Yca+lGvYim/DumswDQYJKoZIhvcNAQELBQADggEBAA4kkTiDPhcMrmAgg1xCb3eZyb/endaWVooO+TTgFoSNju9KPkhyCCkzB3SF3M1VbaZI+O/1lip5WV8JjoNxTKKt0eMo5PpyxPebBGoJ/XpetdJT8e/EjRa61CaiJnfY3rs5u9iH9wDD8M7CmrqkK5qD4S68TYcgCb4tXB4bPFklDZ37OkKrShzWN7gDKlkGk8XSUdYMuRn9M2RmLObeKbuZmjBrp2yyGuVTOlPjczFnmsx+21UbDKrqbW8AHPzg5YEbJDr1i7nFvF4ME73/BUc4NNH2s29fyxxKFF5H2GVp86OhI36G9f7OK1ZgxXTbRWiRUZIE0975Pi0GHaWgllE=
- Processing http://www.remixpr.in/Folder/index.php ended.
- Processing request to scarping url http://www.remixpr.in/Folder/index.php
- Got response 301 from http://www.remixpr.in/robots.txt
- Got response 200 from http://www.remixpr.in/Folder/index.php
- Starting save_html Action on url: 'http://www.remixpr.in/Folder/index.php'
- Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7fe00489b690>> encoding and parser <<lxml.etree.HTMLParser object at 0x7fe0079bf910>>
- WebPage saved successfully to /home/jenkins/phishing_consistency/output/DocuSign/08-03-2020/AS12876/www.remixpr.in/Folder/892302da__index.html
- Starting save_complete Action on url: ['http://www.remixpr.in/Folder/index.php']
- Starting save_assets Action on url: 'http://www.remixpr.in/Folder/index.php'
- Level 100 - Queueing download of <21> asset files.

Hi, The scraping process every time stucks after several minutes.
It's stucks on
"webpage - Level 100 - Queueing download of <21> asset files."

Can someone assist me please?

How to clone linked pages?

I'm running the following example. The target page downloads OK, however none of the linked pages are being downloaded. Is there a configuration flag I can set, to download hyperlinked pages?

`from pywebcopy import save_website

kwargs = {'project_name': 'xxxx-clone-eb'}

save_website(
url='https://xxxxx.com/ARTICLES/xxxx.htm',
project_folder='/Users/xxxx/Documents/Code/xxxxx/EB',
**kwargs
)
`

inconsistent handling of filetypes

Using the 'WebPage' class and WebPage.save_assets(), and having explicitly set pywebcopy.config['allowed_file_ext'] = ['.html','.css'], I'm seeing inconsistent handling of some filetypes. Specifically, it seems to be misinterpreting filetypes at times:

From what I can tell, the same issue does not happen when using pywebcopy.save_webpage().

Code:

# -*- coding: utf-8 -*-

import os
import time
import threading

import pywebcopy

preferred_clock = time.time

project_folder = '/Users/reed/Downloads/scraped_content'
project_name = 'example_project'

urls = [
	'https://codeburst.io/building-beautiful-command-line-interfaces-with-python-26c7e1bb54df',
	'https://owl.purdue.edu/owl/general_writing/academic_writing/establishing_arguments/rhetorical_strategies.html',
	'http://www.history.com/topics/cold-war/hollywood-ten'
]

pywebcopy.config.setup_config(
	project_url=project_url,
	project_folder=project_folder,
	project_name=project_name,
	over_write=True,
	bypass_robots=True,
	debug=False,
	log_file='/Users/reed/Downloads/scraped_content/pwc_log.log',
	join_timeout=1,
	load_css=False,
	load_images=False,
	load_javascript=False
)

pywebcopy.config['allowed_file_ext'] = ['.html','.css']#,'svg','.js','.jpg','.png','.htm','jpeg']

start = preferred_clock()

# method_1
for url in urls:
	pywebcopy.save_webpage(url=url,
						   project_folder=project_folder,
						   project_name=project_name,
						   join_timeout=1,
						   load_css=False,
						   load_images=False,
						   load_javascript=False)
	for thread in threading.enumerate():
	    if thread == threading.main_thread():
	        continue
	    else:
	        thread.join()

print("Execution time : ", preferred_clock() - start)```

How to limit the crawling depth?

It looks like the scan_level parameter was responsible for this earlier. But it has been deprecated. What can I do to limit depth of crawl?

Process hanging

I'm working to mirror a whole site, and it seems to be going fine, but then it just hangs, basically forever (I left it overnight, but it was still hanging.)

It doesn't leave a message, it just hangs.

When I interrupt by keyboard, it stops here:

File "/usr/local/src/mirror/otf_mirror/.venv/lib/python3.6/site-packages/pywebcopy/crawler.py", line 128, in run
    D_QUEUE.join()              # Wait for the download queue to be emptied

Is there anyway to know:

how big the queue is
is it actually getting smaller?

When it is running, the process is using pretty much all of the CPU of the VPS I have, plus about 35% of the memory, but when it's hanging, it drops down to 2%

Is there a way to perhaps tweak the speed so that the queue doesn't get overloaded?

I'm going to try to increase the capacity of my VPS to see if that helps.

Crawler.crawl() only saves first page

Issue: crawler.crawl() only saves the first page of the website.
Expected result: All pages of the website are downloaded
Description:
I am trying this out on my own local machine using python's http.server. Here is the directory structure of my test website:

.
├── 1.html
├── 2.html
├── folder
│   └── folder.html
└── index.html

Index.html contains the following information:

<!DOCTYPE html>
<p style="color:yellow">
The quick brown fox jumps over the lazy dog
</p>

<a href="1.html">1.html</a>
<a href="2.html">2.html</a>
<a href="folder/folder.html">folder.html</a>

Here is the code of my script:

from pywebcopy import Crawler, config

class Downloader:

    # Class variables
    USERAGENT = "Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0"

    def download_website(self, url, folder):
        config.setup_config(
            project_url=url,
            project_folder=folder,
            zip_project_folder=False,
            over_write=True,
            bypass_robots=True
        )
        headers = config.get("http_headers")
        headers["User-Agent"] = self.USERAGENT
        config["http_headers"] = headers
        crawler = Crawler()
        print(f"Downloading {url} to {folder}")
        crawler.crawl()

website_file_path = "/tmp/savefiles"
url = "http://localhost:8000"
downloader = Downloader()
downloader.download_website(url, website_file_path)

Output of the code:

/home/user/script/venv/lib64/python3.6/site-packages/pywebcopy/webpage.py:84: UserWarning: Global Configuration is not setup. You can ignore this if you are going manual.This is just one time warning regarding some unexpected behavior.
  "Global Configuration is not setup. You can ignore this if you are going manual."
Downloading http://localhost:8000 to /tmp/savefiles
pywebcopy.configs - INFO     - Got response 200 from http://localhost:8000/
parsers    - INFO     - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7fe40bc2e1d0>> encoding <ISO-8859-1> and parser <<lxml.etree.HTMLParser object at 0x7fe40be7e178>>
webpage    - INFO     - Starting save_complete Action on url: ['http://localhost:8000/']
webpage    - INFO     - Starting save_assets Action on url: 'http://localhost:8000/'
webpage    - Level 100 - Queueing download of <3> asset files.
webpage    - INFO     - Starting save_html Action on url: 'http://localhost:8000/'
webpage    - INFO     - WebPage saved successfully to /tmp/savefiles/localhost/localhost/index.html
pywebcopy.configs - INFO     - Got response 200 from http://localhost:8000/2.html
parsers    - INFO     - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7fe40bc2e518>> encoding <ISO-8859-1> and parser <<lxml.etree.HTMLParser object at 0x7fe40be7e178>>
pywebcopy.configs - INFO     - Got response 200 from http://localhost:8000/1.html
parsers    - INFO     - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7fe40bc2e908>> encoding <ISO-8859-1> and parser <<lxml.etree.HTMLParser object at 0x7fe40be7e210>>
pywebcopy.configs - INFO     - Got response 200 from http://localhost:8000/folder/folder.html
parsers    - INFO     - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x7fe40bc2ee48>> encoding <ISO-8859-1> and parser <<lxml.etree.HTMLParser object at 0x7fe40be7e2a8>>

After this, running the ls command at /tmp/savefiles/localhost/localhost shows that only index.html was downloaded. What I hope to achieve is to download 1.html, 2.html and folder/folder.html as well.

Python version: 3.6.8
pywebcopy==6.3.0

save_webpage() never exits

I've found that when I use save_webpage per the examples, my script never completes. Is there a way to overcome this issue? My use case needs automation, which makes hanging processes undesirable.

AssertionError: A file like object with read method is required!

I always face AssertionError when running the program. This is my code using pywebcopy and save the zip file into S3. Ask someone to help me, thanks! (MacOS, pywebcopy==6.2.0)

class Webp:

def __init__(self, downloadpath):
    self.downloadpath = downloadpath
    self.file = downloadpath

def websaving(self, url, projectName):

    # self.file is the path of the downloading zip file

    self.file = self.downloadpath + '/' + projectName + '.zip'
    download_folder = self.downloadpath

    kwargs = {'bypass_robots': True, 'project_name': projectName}

    save_webpage(url, download_folder, **kwargs)


        link = "a web page link"
        path = 'path to save them locally'
        name = 'Test'

        web = Webp(path)
        web.websaving(link, name)

ImportError: cannot import name UserDict

I encountered an issue today while trying to import pywebcopy. This is related to the closed issue . I tried to install via pip and also building it manually by cloning from github.

Traceback (most recent call last):
  File "test_untitled2.py", line 13, in <module>
    from pywebcopy import save_webpage
  File "/usr/local/lib/python2.7/dist-packages/pywebcopy/__init__.py", line 49, in <module>
    from .configs import config, SESSION
  File "/usr/local/lib/python2.7/dist-packages/pywebcopy/configs.py", line 13, in <module>
    from collections import UserDict
ImportError: cannot import name UserDict

Any idea what could be the problem ?

Thanks in advance.

FileNotFoundError while trying to do save_webpage

Running below code snippet from the documentation is throwing a FileNotFoundError. Can you please check?


url = 'https://pypi.org/project/pywebcopy/'

kwargs = {'project_name': 'some-fancy-name'}

save_webpage(
    url=url,
    project_folder='output/',
    **kwargs
)

Error Message:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/vinay/PycharmProjects/Web_Scraping/output/some-fancy-name/output/some-fancy-name/pywebcopy_log.log'

Full traceback:

  File "/Users/vinay/PycharmProjects/Web_Scraping/html_download.py", line 11, in <module>
    **kwargs
  File "/Users/vinay/anaconda3/envs/Web_Scraping/lib/python3.7/site-packages/pywebcopy/api.py", line 137, in save_website
    config.setup_config(url, project_folder, project_name, **kwargs)
  File "/Users/vinay/anaconda3/envs/Web_Scraping/lib/python3.7/site-packages/pywebcopy/configs.py", line 176, in setup_config
    self.setup_paths(project_folder, project_name)
  File "/Users/vinay/anaconda3/envs/Web_Scraping/lib/python3.7/site-packages/pywebcopy/configs.py", line 138, in setup_paths
    f_stream = logging.FileHandler(lf, 'w')
  File "/Users/vinay/anaconda3/envs/Web_Scraping/lib/python3.7/logging/__init__.py", line 1092, in __init__
    StreamHandler.__init__(self, self._open())
  File "/Users/vinay/anaconda3/envs/Web_Scraping/lib/python3.7/logging/__init__.py", line 1121, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/vinay/PycharmProjects/Web_Scraping/output/some-fancy-name/output/some-fancy-name/pywebcopy_log.log'

Python Env: 3.7.3
OS: MacOS

Encoding issues for websites in non-English languages such as Chinese, Japanese, etc.

The encoding of the downloaded website is a Unicode Numeric character reference, and this encoding does not display the real content in the browser

save_website/crawl() does not download PDF

I tried to clone a complete website and noticed that the PDF files were skipped.
This is the code I currently use:

config.setup_config(	project_url=URL,
 			project_folder=ProjectFolder,
 			project_name=ProjectName,
 			bypass_robots=True,
 			)

crawler = Crawler()
crawler.crawl()

But

save_website(	url='http://example-site.com/index.html',
 		project_folder='path/to/downloads',
 		**kwargs
 		)

Produced the same result.

One of the URLs I tested was: https://www.akkufit-berger.de/kataloge/#akkus
As far as I can see the PDF extension is part of the “safe_file_exts”, which is the default option.

Even if I point the URL directly to the PDF file, it just downloads an html file which has a different file size as the original PDF and cannot be opened with the browser, or the PDF viewer.

Question: can it continue a suspended job?

Trying to clone a webpage, but it froze after a while, probably due to some network hiccups. I had to kill the process and start over (only to get stuck again, to be honest). Is it possible for this module to continue a suspended job, skipping files that have already been saved?

(Also, what are the time out thresholds and retry limits for the requests? Can I specify these values?)

(Also, can I make it print some logs if a request failed or timed out and is doing a retry?)

Windows 10, Python 3.8.1. Module installed via pip install pywebcopy, module called by command line python -m pywebcopy save_webpage http://y.tuwan.com/chatroom/3701 ./ --bypass_robots.

Script not completed

Hey! this is my code:


import pywebcopy
url = 'https://habr.com/ru/post/203012/'
kwargs = {}
project_folder = 'download'
project_name = 'habr'
logfile = 'log_file.log'
# You should always start with setting up the config or use apis
pywebcopy.config.setup_config(url, project_folder, project_name, log_file=logfile, **kwargs)
wp = pywebcopy.WebPage()
wp.get(url)
#wp.save_complete()
wp.save_html()
wp.save_assets()

Threads get stuck after scrapping. How can I exit the script in the usual way?

Double Folder Prepending

Hey, I ran into an issue with folders pywebcopy's folder writes...
I'm using Anaconda python 3, with pywebcopy 6.0.0.

My code looks like this:

    kwargs = {'project_name': 'a_name'}

    save_webpage(
        url=URL,
        project_folder='p_folder',
        **kwargs
    )

URL is a valid URL, but when I run the program, I get this result:

C:\ProgramData\Anaconda3\python.exe H:/Python/PowerBee/URLDownloader.py
('t5_38jid', 'aipum7', 'Supreme Court lets Trump transgender troop restrictions take effect', 'https://www.reuters.com/article/us-usa-court-transgender/supreme-court-lets-trump-transgender-troop-restrictions-take-effect-idUSKCN1PG1RI')
Traceback (most recent call last):
  File "H:/Python/PowerBee/URLDownloader.py", line 38, in <module>
    **kwargs
  File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\api.py", line 56, in save_webpage
    config.setup_config(url, project_folder, project_name, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\configs.py", line 176, in setup_config
    self.setup_paths(project_folder, project_name)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\configs.py", line 138, in setup_paths
    f_stream = logging.FileHandler(lf, 'w')
  File "C:\ProgramData\Anaconda3\lib\logging\__init__.py", line 1092, in __init__
    StreamHandler.__init__(self, self._open())
  File "C:\ProgramData\Anaconda3\lib\logging\__init__.py", line 1121, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Python\\PowerBee\\p_folder\\a_name\\p_folder\\a_name\\pywebcopy_log.log'

Process finished with exit code 1

It looks like the folder name and project name get appended to the path twice, somehow. The folder structure does seem to be created correctly by the pywebcopy, though. I'm not sure where in the library that file opening occurs. Could you take a look at it?

Hangs in involuntary places. Using a basic example.

pip3 install pywebcopy
from pywebcopy import save_webpage

kwargs = {'project_name': 'some-fancy-name'}

save_webpage(
url='https://fantlab.ru/',
project_folder='downloads',
**kwargs
)
3) Hangs either on js or on saving icons.

OS GNU/LInux Debian.
python 3.7

ImportError: cannot import name 'findall' from 'parse'

The issue seems to be here:

File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pywebcopy\parsers.py", line 22, in <module>
    from parse import findall, search as parse_search

load_css/images/javascript arguments not working

Using the load_css, load_images, and load_javascript arguments for config.setup_config() and save_webpage() doesn't seem to restrict the types of files downloaded. Using False for all still resulted in css, image, an javascript files downloaded.

That said, they do appear to have some effect. When I set those arguments to False using config.setup_config() they seem to have no effect, and the below code still hangs when saving the first URL in the list. But when I also pass those parameters to the save_webpage() function, it still downloads all those filetypes (so doesn't work as I would expect) but it does cause the program to run to completion. Unclear why passing those arguments directly to save_webpage() is allowing the program to finish.

Next, I tried to set the allowed_file_ext argument for both config.setup_config() and save_webpage() but neither accepts that argument.

So finally I directly set config['allowed_file_ext'] = ['.html','.css','svg','.js','.jpg','.png','.htm','jpeg'] and that did seem to restrict the file types downloaded for the most part, although it is still downloading some other types like '.pwc'.

Code:

# -*- coding: utf-8 -*-

import os
import time
import threading

import pywebcopy

preferred_clock = time.time

project_folder = '/Users/reed/Downloads/scraped_content'
project_name = 'example_project'

urls = [
	'https://codeburst.io/building-beautiful-command-line-interfaces-with-python-26c7e1bb54df',
	'https://owl.purdue.edu/owl/general_writing/academic_writing/establishing_arguments/rhetorical_strategies.html',
	'http://www.history.com/topics/cold-war/hollywood-ten'
]

pywebcopy.config.setup_config(
	project_url=project_url,
	project_folder=project_folder,
	project_name=project_name,
	over_write=True,
	bypass_robots=True,
	debug=False,
	log_file='/Users/reed/Downloads/scraped_content/pwc_log.log',
	join_timeout=5,
	load_css=False,
	load_images=False,
	load_javascript=False
)

start = preferred_clock()

# pywebcopy.config['allowed_file_ext'] = ['.html','.css','svg','.js','.jpg','.png','.htm','jpeg']

# method_1
for url in urls:
	pywebcopy.save_webpage(url=url,
						   project_folder=project_folder,
						   project_name=project_name,
						   join_timeout=5)#,
						   #load_css=False,
						   #load_images=False,
						   #load_javascript=False)

for thread in threading.enumerate():
    if thread == threading.main_thread():
        continue
    else:
        thread.join()

print("Execution time : ", preferred_clock() - start)```

how to change user agent?

File ext '' is not allowed for file at

`#!/usr/bin/python

-- coding:utf-8 --

from pywebcopy import WebPage

url = 'http://baijiahao.baidu.com/s?id=1622464741855375146'
project_loc = 'C:/Users/康康/Desktop'

wp = WebPage(url,
project_folder=project_loc,
default_encoding=None,
HTML=None,
bypass_robots=True
)

wp.save_complete()`

i got CRITICAL - pywebcopy.core.new_file:189 - File ext '' is not allowed for file at 'http://t10.baidu.com/it/u=788283957,2664293696&fm=173&app=49&f=JPEG?w=528&h=446&s=27C8D010495E60CC40E4045A0300C0F2'

but i have set bypass_robots

config.setup_config hangs for specific url

problematic url = 'http://www.mcclatchydc.com/opinion/article24604315.html'

CODE I'm using (which seems to work with most other urls):

url = 'http://www.mcclatchydc.com/opinion/article24604315.html'
config.setup_config(url,
... project_name=str(hash(url)),
... project_folder=base_dir_txt, bypass_robots=True, over_write=False, debug=True)

OUTPUT:
pywebcopy.configs - DEBUG - {'debug': True, 'log_file': '/Users/reed/Downloads/scraped_content/-4279476708681949691/pywebcopy_log.log', 'project_name': '-4279476708681949691', 'project_folder': '/Users/reed/Downloads/scraped_content/-4279476708681949691', 'over_write': False, 'bypass_robots': True, 'zip_project_folder': True, 'delete_project_folder': False, 'allowed_file_ext': ['.html', '.php', '.asp', '.aspx', '.htm', '.xhtml', '.css', '.json', '.js', '.xml', '.svg', '.gif', '.ico', '.jpeg', '.pdf', '.jpg', '.png', '.ttf', '.eot', '.otf', '.woff', '.woff2', '.pwcf'], 'http_headers': {'Accept-Language': 'en-US,en;q=0.9', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0 PyWebCopyBot/6.1.1'}, 'load_css': True, 'load_javascript': True, 'load_images': True, 'join_timeout': None, 'project_url': 'http://www.mcclatchydc.com/opinion/article24604315.html'}
urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): www.mcclatchydc.com:80

That's it - it just hangs there and never finishes executing that command.

how to use cookies?

how to use cookies same as a browser would do, the cookies are saved automaticaly and no manual copy needed?

ValueError: path is on mount 'L:', start on mount 'C:'

I am getting this error, Any idea why?

Code is just this on scrap.py

from pywebcopy import save_webpage
kwargs = {
    'project_folder': '/assets/saved_webPages',
    'project_name': 'localhost'
}
url = 'https://en.wikipedia.org/wiki/EndNote'
save_webpage(
    url=url,
    **kwargs
)

Error Logs
C:\xampp\htdocs\project\venv\Scripts\python.exe C:/xampp/htdocs/project/scrap.py
config - INFO - Got response <Response [200]> from https://en.wikipedia.org/robots.txt
config - INFO - Got response <Response [200]> from https://en.wikipedia.org/wiki/EndNote
webpage - INFO - Starting save_complete Action on url: ['https://en.wikipedia.org/wiki/EndNote']
parsers - INFO - Parsing tree with source: <<urllib3.response.HTTPResponse object at 0x03617850>> encoding and parser <<lxml.etree.HTMLParser object at 0x035BEC30>>
Traceback (most recent call last):
File "C:/xampp/htdocs/project/scrap.py", line 21, in
**kwargs
File "C:\xampp\htdocs\project\venv\lib\site-packages\pywebcopy\api.py", line 88, in save_webpage
wp.save_complete()
File "C:\xampp\htdocs\project\venv\lib\site-packages\pywebcopy\webpage.py", line 275, in save_complete
self.parse() # call in the action
File "C:\xampp\htdocs\project\venv\lib\site-packages\pywebcopy\parsers.py", line 192, in parse
o = factory(elem, attr, url, pos)
File "C:\xampp\htdocs\project\venv\lib\site-packages\pywebcopy\elements.py", line 456, in make_element
rel_path = pathname2url(obj.relative_to(utx.file_path))
File "C:\xampp\htdocs\project\venv\lib\site-packages\pywebcopy\urls.py", line 469, in relative_to
rel_path = os.path.relpath(head, start)
File "C:\Users\My Computer\AppData\Local\Programs\Python\Python37-32\lib\ntpath.py", line 562, in relpath
path_drive, start_drive))
ValueError: path is on mount 'L:', start on mount 'C:'

Process finished with exit code 1

program hangs and does not exit

Trying Examples 1 & 2 from the "How to - Save Single Webpage" section in readme.md, as well as method 3 from examples.py. Using python 3.7, pywebcopy 6.3, and one of the example URLs from example.py: 'https://codeburst.io/building-beautiful-command-line-interfaces-with-python-26c7e1bb54df'

Issues: Method 1 & 2 hang every time. Method 3 appears to be deprecated. Nothing appears in my log_file with this approach, so difficult to troubleshoot further. And the join_timeout setting doesn't appear to have any effect.

Based on the other open issue (#35 ), I also included the thread-closing loop from examples.py.

Files are downloading, but when I try to open the main HTML file it never shows any of the images (perhaps it never got to the point of saving them?).

My code, modified from examples:

import time
import threading

import pywebcopy

preferred_clock = time.time

project_url = 'https://codeburst.io/building-beautiful-command-line-interfaces-with-python-26c7e1bb54df'
project_folder = '/Users/user/Downloads/scraped_content'
project_name = 'example_project'

pywebcopy.config.setup_config(
	project_url=project_url,
	project_folder=project_folder,
	project_name=project_name,
	over_write=True,
	bypass_robots=True,
	debug=False,
	log_file='/Users/user/Downloads/scraped_content/pwc_log.log',
	join_timeout=30
)

start = preferred_clock()

# method_1 - This one hangs every time (never finishes so I have to halt).
'''
pywebcopy.save_webpage(url=project_url,
					   project_folder=project_folder,
					   project_name=project_name)
'''

# method_2 - This one also hangs every time
wp = pywebcopy.WebPage()
wp.get(project_url)
wp.save_complete()
wp.shutdown()

# method_3_from_examples.py - this one is deprecated: 
# "Direct initialisation with url is not supported now."
'''
pywebcopy.WebPage(url=project_url,project_folder=project_folder).save_complete()
'''

for thread in threading.enumerate():
    if thread == threading.main_thread():
        continue
    else:
        thread.join()

print("Execution time : ", preferred_clock() - start)```

save "complete webpage" page.html and /page

Hello, Good job you did on such task, however, I was wondering if it's possible to save the HTML page under "page.html" and have all the assets under one folder with the same name "page_files" for e.g. where the latter folder can have js , css and photos etc ..
For a given URL : https://www.example.com/page.html
can I have this as output?
-- example.com
| --- page_files ( folder containing all assets js,css ... can be many folders as well)
| --- page.html

Thank you in advance

Documented command-line interface example fails

The command line interface claims that a command like this will work:

python -m pywebcopy save_webpage https://www.maskupmanateecoalition.org

But instead the following error is thrown:

Traceback (most recent call last):
  File "C:\Users\thequ\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\thequ\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\thequ\anaconda3\lib\site-packages\pywebcopy\__main__.py", line 69, in <module>
    fire.Fire(Commands)
  File "C:\Users\thequ\anaconda3\lib\site-packages\fire\core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\thequ\anaconda3\lib\site-packages\fire\core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\thequ\anaconda3\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C:\Users\thequ\anaconda3\lib\site-packages\pywebcopy\__main__.py", line 47, in save_webpage
    return swp(*args, **kwargs)
TypeError: save_webpage() missing 1 required positional argument: 'project_folder'

TypeError

In bypass robots statements it shows me error @rajatomar788 since i assigned true!!
bypass_robots=True
TypeError: unsupported operand type(s) for ** or pow(): 'bool' and 'dict'

site restrictions

works on some websites but in others it fails, i looked in issues for any solution for "permission error" found one i ignored robots.txt but it still gets permission error, but there is just a small difference with robots txt bypass it downloads 1 more page than before, no chance with this site "http://mathworld.wolfram.com/"

program download method

will the program check if the file already exists or it will download it anyway and if it exists it will replace it? its very important thing because it affects scraping time, bandwidth resource usage and spider detection, some websites detect if you scrape them if you download same files again and again

ValueError: path is on mount 'S:', start on mount 'C:'

I am getting this error, Any idea why?

from pywebcopy import save_webpage

kwargs = {'project_name': 'some-fancy-name'}
url = 'https://www.bloomberg.com/company/career/global-data-externship/'

save_webpage(
    url=url,
    project_folder=r'C:\Users\Kevin\Downloads\Cheats',
    **kwargs
)

Log
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\elements.py", line 331, in run
contents = self.replace_urls(req.content, self.repl)
File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\elements.py", line 292, in replace_urls
contents = CSS_URLS_RE.sub(repl, css_string)
File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\elements.py", line 273, in repl
url = pathname2url(relate(new_element.file_path, self.file_path))
File "C:\ProgramData\Anaconda3\lib\site-packages\pywebcopy\urls.py", line 438, in relate
return os.path.join(os.path.relpath(target_dir, start_dir), os.path.basename(target_file))
File "C:\ProgramData\Anaconda3\lib\ntpath.py", line 562, in relpath
path_drive, start_drive))
ValueError: path is on mount 'S:', start on mount 'C:'

Download only of package not possible

To create a rpm package I need to download the tar.gz. But this failes as setup.py imports pywebcopy and so you need to have all dependencies installed on the build server:

buildhost :: /tmp » pip download --no-deps --no-binary=:all: pywebcopy
Collecting pywebcopy
  Downloading https://files.pythonhosted.org/packages/69/c1/e9b8429a34ccbfa446a0a019a21303d9496823b31a2c80f8946cb114c130/pywebcopy-6.3.0.tar.gz (52kB)
     |████████████████████████████████| 61kB 1.6MB/s 
  Saved ./pywebcopy-6.3.0.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-download-v2d6nmo5/pywebcopy/setup.py", line 14, in <module>
        import pywebcopy
      File "/tmp/pip-download-v2d6nmo5/pywebcopy/pywebcopy/__init__.py", line 50, in <module>
        from .parsers import Parser, MultiParser
      File "/tmp/pip-download-v2d6nmo5/pywebcopy/pywebcopy/parsers.py", line 19, in <module>
        from pyquery import PyQuery
    ModuleNotFoundError: No module named 'pyquery'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-download-v2d6nmo5/pywebcopy/

Hangup when calling save_webpage from active thread in a multithreaded python app

On Windows 10, executing:
kwargs = {'project_name' : my_name}
pywebcopy.save_webpage(url=my_url, project_folder=my_folder, **kwargs)
works well in a self-standing python interpreter launched from a console.

Executing the exact same code from an active thread in a multi-threaded python code shows the call to save_webpage to never return.

Path is on mount S: start on mount C:

windows 10 pro python 3.7.8
just the one site "https://breakingnewscentral.com/" seems to have this issue.

With either paths 'Downloads/this_dumb_thing' or 'C:/Users/guy/Downloads/this_dumb_thing'

currently i used this
`def scrape(url, folder, timeout=1):
config.setup_config(url, folder)

wp = WebPage()
wp.get(url)

# start the saving process
wp.save_complete()

# join the sub threads
for t in wp._threads:
	if t.is_alive():
		t.join(timeout)

# location of the html file written
return wp.file_path`

before i just used example code and got the same thing

`kwargs = {'project_name': 'thedetrend'}
save_webpage(

	# url pf the website
	url=url,
	
	# folder where the copy will be saved
	project_folder=folder,
	**kwargs
)`

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url:

https://webcache.googleusercontent.com/search?q=cache:xmUxEzUZmaMJ:https://www.westermann.de/artikel/978-3-507-84567-1/Faktor-Mathematik-fuer-die-Sekundarstufe-I-in-Berlin-Ausgabe-2006-Lehrermaterial-7+&cd=1&hl=de&ct=clnk&gl=de

I can not use save_webpage() for the above URL although I am using bypass_robots=True.

ModuleNotFound error

Hello,

I am successfully using pywebcopy with Python 3.7 on Mac. However, when I run the same code in Python 3.5 running in Linux with exact same modules installed via virtualenv, it fails with error shown below, even though exact same modules are installed. PyWebCopy error message does not indicate what module is missing and since same modules are installed in both Mac and Linux, this seems like it may be a bug?

(venv35) [root@eye ~/cpt/chirp]# python chirp.py example.cidr
Traceback (most recent call last):
  File "chirp.py", line 27, in <module>
    import pywebcopy
  File "/root/cpt/chirp/venv35/lib/python3.5/site-packages/pywebcopy/__init__.py", line 49, in <module>
    from .configs import config, SESSION
  File "/root/cpt/chirp/venv35/lib/python3.5/site-packages/pywebcopy/configs.py", line 18, in <module>
    from .exceptions import AccessError
  File "/root/cpt/chirp/venv35/lib/python3.5/site-packages/pywebcopy/exceptions.py", line 30, in <module>
    class DependencyNotFoundError(ModuleNotFoundError):
NameError: name 'ModuleNotFoundError' is not defined
(venv35) [root@eye ~/cpt/chirp]#

List of packages from Mac with Python 3.7 (working):

(venv37) MBP-16:chirp$ pip list
Package           Version   
----------------- ----------
astroid           2.3.3     
beautifulsoup4    4.8.2     
certifi           2019.11.28
chardet           3.0.4     
cssselect         1.1.0     
dnspython         1.16.0    
fire              0.2.1     
future            0.18.2    
idna              2.9       
isort             4.3.21    
lazy-object-proxy 1.4.3     
lxml              4.5.0     
mccabe            0.6.1     
parse             1.15.0    
pip               20.0.2    
pylint            2.4.4     
pyquery           1.4.1     
python-whois      0.7.2     
pywebcopy         6.1.1     
requests          2.23.0    
setuptools        40.8.0    
six               1.14.0    
soupsieve         2.0       
termcolor         1.1.0     
typed-ast         1.4.1     
urllib3           1.25.8    
w3lib             1.21.0    
wrapt             1.11.2    
(venv37) MBP-16:chirp$

List of packages from Linux with Python 3.5 (not working):

(venv35) [root@eye ~/chirp]# pip list
Package           Version
----------------- ----------
astroid           2.3.3
beautifulsoup4    4.8.2
certifi           2019.11.28
chardet           3.0.4
cssselect         1.1.0
dnspython         1.16.0
fire              0.2.1
future            0.18.2
idna              2.9
isort             4.3.21
lazy-object-proxy 1.4.3
lxml              4.5.0
mccabe            0.6.1
parse             1.15.0
pip               20.0.2
pylint            2.4.4
pyquery           1.4.1
python-whois      0.7.2
pywebcopy         6.1.1
requests          2.23.0
setuptools        18.2
six               1.14.0
soupsieve         2.0
termcolor         1.1.0
typed-ast         1.4.1
urllib3           1.25.8
w3lib             1.21.0
wrapt             1.11.2
(venv35) [root@eye ~/chirp]#