Giter Site home page Giter Site logo

igorgakhov / pageloader Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 109 KB

The training project "PageLoader" on the Python Development course on Hexlet.io

Home Page: https://ru.hexlet.io/programs/python/projects/51

Makefile 2.37% Python 97.63%
beautifulsoup filesystem hexlet-projects http multithreading page-loader page-loading python request request-mock

pageloader's Introduction

PageLoader


The training project "PageLoader" on the Python Development course on Hexlet.io.

Actions Status linter-and-tests-check Maintainability Test Coverage

Built With

Languages, frameworks and libraries used in the implementation of the project:

Dependencies

List of dependencies, without which the project code will not work correctly:

  • python = "^3.8"
  • requests = "^2.28.1"
  • beautifulsoup4 = "^4.11.1"
  • progress = "^1.6"

Description

PageLoader is a command line utility that downloads pages from the Internet and saves them to your computer. Together with the page, it downloads all the resources (pictures, styles and js) making it possible to open the page without the Internet.

By the same principle, saving pages in the browser is arranged.

The utility multi-threadedly downloads resources and shows the progress for each resource in the terminal.

Summary


Installation

Python

Before installing the package, you need to make sure that you have Python version 3.8 or higher installed:

# Windows, Ubuntu, MacOS:
>> python --version # or python -V
Python 3.8.0+

⚠️ If a command without a version does not work, specify the Python version explicitly: python3 --version.

If you have an older version installed, update with the following commands:

# Windows:
>> pip install python --upgrade
# Ubuntu:
>> sudo apt-get upgrade python3.X
# MacOS:
>> brew update && brew upgrade python
# * X - version number to be installed

If you don't have Python installed, you can download and install it from the official Python website. If you are an Ubuntu or MacOS user, then it is better to do this procedure through package managers. Open a terminal and run the command for your operating system:

# Ubuntu:
>> sudo apt update
>> sudo apt install python3.X
# MacOS:
# https://brew.sh/index_ru.html
>> brew install python3.X
# * X - version number to be installed

❗ The configuration of assemblies of different versions of operating systems can vary greatly from each other, which makes it impossible to write a common instruction. If you're running an OS other than the above, or you're having errors after the suggested commands, search Stack Overflow for answers, maybe someone else has come across them before you! Setting up the environment is not easy! 🙂

Poetry

The project uses the Poetry manager. Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. You can read more about this tool on the official Poetry website.

Poetry provides a custom installer that will install poetry isolated from the rest of your system by vendorizing its dependencies. This is the recommended way of installing poetry.

# Windows (WSL), Linux, MacOS:
>> curl -sSL https://install.python-poetry.org | python3 -
# Windows (Powershell):
>> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
# If you have installed Python through the Microsoft Store, replace "py" with "python" in the command above.

⚠️ On some systems, python may still refer to Python 2 instead of Python 3. The Poetry Team suggests a python3 binary to avoid ambiguity.

⚠️ By default, Poetry is installed into a platform and user-specific directory:

  • ~/Library/Application Support/pypoetry on MacOS.
  • ~/.local/share/pypoetry on Linux/Unix.
  • %APPDATA%\pypoetry on Windows.

If you wish to change this, you may define the $POETRY_HOME environment variable:

>> curl -sSL https://install.python-poetry.org | POETRY_HOME=/etc/poetry python3 -

Add Poetry to your PATH.

Once Poetry is installed and in your $PATH, you can execute the following:

>> poetry --version

Project package

To work with the package, you need to clone the repository to your computer. This is done using the git clone command. Clone the project on the command line:

# clone via HTTPS:
>> git clone https://github.com/IgorGakhov/python-project-51.git
# clone via SSH:
>> git clone [email protected]:IgorGakhov/python-project-51.git

It remains to move to the directory and install the package:

>> cd python-project-51
>> poetry build
>> python3 -m pip install --user dist/*.whl
# If you have previously installed a package and want to update it, use the following command:
# >> python3 -m pip install --user --force-reinstall dist/*.whl

Finally, we can move on to using the project functionality!


Usage

As external library

from page_loader import download
file_path = download(url_address, destination)

As CLI tool

Help

The utility provides the ability to call the help command if you find it difficult to use:

>> page-loader --help
usage: page-loader [-h] [--output DESTINATION] url_address

Downloads the page from the network and puts it in the specified existing directory (default: working directory).

positional arguments:
  url_address           page being downloaded

options:
  -h, --help            show this help message and exit
  --output DESTINATION  output directory (default: current dir)

asciicast

Demo

⚡ Only absolute file paths are supported.

📌 Page loading

The utility downloads resources and shows the progress of each resource in the terminal.

Example:

>> page-loader --output /home/user/page_storage https://page-loader.hexlet.repl.co/
12:41:24 INFO: Initiated download of page https://page-loader.hexlet.repl.co/ to local directory «/home/user/page_storage» ...
12:41:25 INFO: Response from page https://page-loader.hexlet.repl.co/ received.
Page available for download!
Resources Loading |████████                        | 25%   [1/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/script.js saved successfully!
Resources Loading |████████████████                | 50%   [2/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/professions/nodejs.png saved successfully!
Resources Loading |████████████████████████        | 75%   [3/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/assets/application.css saved successfully!
Resources Loading |████████████████████████████████| 100%   [4/4]
12:41:26 INFO: [+] Resource https://page-loader.hexlet.repl.co/courses saved successfully!

12:41:26 INFO: FINISHED! Loading is complete successfully!
The downloaded page is located in the «/home/user/page_storage/page-loader-hexlet-repl-co.html» file.

/home/user/page_storage/page-loader-hexlet-repl-co.html

asciicast


Development

Dev Dependencies

List of dev-dependencies:

  • flake8 = "^4.0.1"
  • pytest = "^7.1.3"
  • pytest-cov = "^3.0.0"
  • requests-mock = "^1.10.0"

Project Organization

>> tree .
.
├── page_loader
│   ├── __init__.py
│   ├── load_processor
│   │   ├── __init__.py
│   │   ├── downloader.py
│   │   ├── file_system_guide.py
│   │   ├── html_parser.py
│   │   ├── name_converter.py
│   │   ├── data_loader.py
│   │   └── saver.py
│   ├── cli.py
│   ├── logger.py
│   ├── progress.py
│   └── scripts
│       ├── __init__.py
│       └── run.py
└── tests
│   ├── auxiliary.py
│   ├── fixtures
│   │   ├── downloaded_nodejs_course.html
│   │   └── mocks
│   │       ├── assets-application.css
│   │       ├── assets-professions-nodejs.png
│   │       ├── courses.html
│   │       ├── packs-js-runtime.js
│   │       └── source_nodejs_course.html
│   ├── test_cli.py
│   ├── test_downloader.py
│   ├── test_file_system_guide.py
│   └── test_html_parser.py
├── journal.log
├── Makefile
├── poetry.lock
├── pyproject.toml
├── README.md
└── setup.cfg

Useful commands

The commands most used in development are listed in the Makefile:

make package-install
Installing a package in the user environment.
make build
Building the distribution of he Poetry package.
make package-force-reinstall
Reinstalling the package in the user environment.
make lint
Checking code with linter.
make test
Tests the code.
make fast-check
Builds the distribution, reinstalls it in the user's environment, checks the code with tests and linter.

Thank you for attention!

👨‍💻 Author: @IgorGakhov

pageloader's People

Contributors

igorgakhov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.