Giter Site home page Giter Site logo

webarchive's Introduction

Own Webarchive

Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.

Supported store formats

  • headers — save all headers from response
  • pdf — save page in pdf
  • single_file — save html and all its resources (css,js,images) into one html file

Requirements

  • Golang 1.19 or higher
  • wkhtmltopdf binary in $PATH (to save pages in pdf)

Configuration

The service can be configured via environment variables. There is a list of available variables:

  • DB
    • DB_PATH — path for the database files (default ./db)
  • LOGGING
    • LOGGING_DEBUG — enable debug logs (default false)
  • API
    • API_ADDRESS — address the API server will listen (default 0.0.0.0:5001)
  • UI
    • UI_ENABLED — Enable builtin web UI (default true)
    • UI_PREFIX — Prefix for the web UI (default /)
    • UI_THEME — UI theme name (default basic). No other values available yet
  • PDF
    • PDF_LANDSCAPE — use landscape page orientation instead of portrait (default false)
    • PDF_GRAYSCALE — use grayscale filter for the output pdf (default false)
    • PDF_MEDIA_PRINT — use media type print for the request (default true)
    • PDF_ZOOM — zoom page (default 1.0 i.e. no actual zoom)
    • PDF_VIEWPORT — use specified viewport value (default 1280x720)
    • PDF_DPI — use specified DPI value for the output pdf (default 150)
    • PDF_FILENAME — use specified name for output pdf file (default page.pdf)

Note: Prefix WEBARCHIVE_ can be used with the environment variable names in case of any conflicts.

Usage

1. Start the server

Start without docker

go run ./cmd/server/main.go

Change API address

API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go

Start in docker

docker compose up -d webarchive

2. Add a page

curl -X POST --location "http://localhost:5001/api/v1/pages" \
    -H "Content-Type: application/json" \
    -d "{
          \"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
          \"formats\": [
            \"pdf\",
            \"headers\"
          ]
        }" | jq .

or

curl -X POST --location \
  "http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"

3. Get the page's info

curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .

where $page_id — value of the id field from previous command response. If status field in response is success (or with_errors) - the results field will contain all processed formats with ids of the stored files.

4. Open file in browser

xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"

Where $page_id — value of the id field from previous command response, and $file_id — the id of interesting file.

5. List all stored pages

curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .

Roadmap

  • Save page to pdf
  • Save URL headers
  • Save page to the single-page html
  • Save page to html with separate resource files (?)
  • Basic web UI
  • Optional authentication
  • Multi-user access
  • Support SQL database with or without separate files storage
  • Tags/Categories
  • Save page to markdown

webarchive's People

Contributors

derfenix avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.