Giter Site home page Giter Site logo

github-userx / pywebarchive Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bmjcode/pywebarchive

0.0 1.0 0.0 623 KB

Software for reading Apple's webarchive format

Home Page: https://pypi.org/project/pywebarchive/

License: MIT License

Python 99.69% Batchfile 0.31%

pywebarchive's Introduction

pywebarchive is software for reading Apple's webarchive format. It consists of two pieces:

  • Webarchive Extractor converts webarchive files to standard pages you can open in any browser.
  • The webarchive Python module is the code "under the hood" that makes the Extractor work. It's available for other applications to use, too.

pywebarchive is open-source software released under the permissive MIT License. Development takes place on GitHub.

Features

  • Available for Windows, macOS, and Linux
  • Converts webarchive files to plain HTML
  • Handles images, scripts, and style sheets
  • Converted pages display just like they would in Safari (apart from normal cross-browser rendering differences)

Downloads

The latest version is pywebarchive 0.5.0 (released April 16, 2022). See the changelog for what's new.

Note: If you're not reading this on GitHub, this section may be out of date. In that case, the latest version of pywebarchive is available at https://github.com/bmjcode/pywebarchive.

File Size Description
Webarchive.Extractor.exe 7.3 MB Webarchive Extractor for 32-bit Windows
Webarchive.Extractor.x64.exe 8.0 MB Webarchive Extractor for 64-bit Windows
pywebarchive-0.5.0.zip source code (zip)
pywebarchive-0.5.0.tar.gz source code (tar.gz)

The Windows version of Webarchive Extractor runs on Windows 7 and higher. It is a portable application -- it doesn't require installation, and won't write to Application Data or the Windows Registry.

On macOS and Linux (and Windows with Python installed), you can run Webarchive Extractor directly from the source code. Both command-line (extractor.py) and graphical (extractor-gui.py) versions are included.

If you're a Python developer, you can also install the webarchive module from PyPI using pip install pywebarchive. Note the module you import is just webarchive, but the package you install is pywebarchive; this is because an unrelated project already claimed the shorter package name.

Requirements

More information

Webarchive is the default format for the "Save As" command in Apple's Safari browser. (Other Apple software also uses it internally for various purposes.) Its main advantage is that it can save all the content on a webpage -- including external media like images, scripts, and style sheets -- in a single file. However, the webarchive format is proprietary and not publicly documented, and most other browsers cannot open webarchive files. pywebarchive solves this by converting webarchive files to standard HTML pages, which can be opened in any browser or editor.

The name "pywebarchive" simply reflects that this is webarchive-handling software written in the Python programming language.

pywebarchive follows the Unix philosophy of "do one thing and do it well". With that in mind, pywebarchive deliberately omits all features unrelated to its purpose of converting webarchive files so other browsers can open them. In particular, pywebarchive does not support writing webarchive files, and there are no plans to add this in a future release.

pywebarchive's internals are fairly well-documented. The code includes extensive comments explaining how it works and why it does various things the way it does. In addition, pywebarchive features dozens of unit tests to ensure the code actually does what we think it does, which is further confirmed by manual testing before each release.

pywebarchive's People

Contributors

bmjcode avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.