Giter Site home page Giter Site logo

wabarc / wayback Goto Github PK

View Code? Open in Web Editor NEW
1.6K 11.0 60.0 2.44 MB

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.

Home Page: https://docs.wabarc.eu.org

License: GNU General Public License v3.0

Dockerfile 0.82% Makefile 1.10% Go 91.28% Shell 2.33% Roff 1.24% HCL 0.22% JavaScript 0.89% HTML 2.12% Procfile 0.01%
telegram archive wayback-machine save-the-internet snapshot-webpage internet-archive memento ipfs heroku mastodon

wayback's Introduction

Wayback

LICENSE Go Report Card Test Coverage Go Reference Releases

Telegram Bot Discord Bot Matrix Bot Matrix Room Tor Hidden Service World Wide Web Nostr

Wayback is a web archiving and playback tool that allows users to capture and preserve web content. It provides an IM-style interface for receiving and presenting archived web content, and a search and playback service for retrieving previously archived pages. Wayback is designed to be used by web archivists, researchers, and anyone who wants to preserve web content and access it in the future.

Features

  • Free and open-source
  • Expose prometheus metrics
  • Cross-platform compatibility
  • Batch wayback URLs for faster archiving
  • Built-in CLI (wayback) for convenient use
  • Serve as a Tor Hidden Service or local web entry for added privacy and accessibility
  • Easier wayback to Internet Archive, archive.today, IPFS and Telegraph integration
  • Interactive with IRC, Matrix, Telegram bot, Discord bot, Mastodon, Twitter, and XMPP as a daemon service for convenient use
  • Supports publishing wayback results to Telegram channel, Mastodon, and GitHub Issues for sharing
  • Supports storing archived files to disk for offline use
  • Download streaming media (requires FFmpeg) for convenient media archiving.

Getting Started

For a comprehensive guide, please refer to the complete documentation.

Installation

The simplest, cross-platform way is to download from GitHub Releases and place the executable file in your PATH.

From source:

go install github.com/wabarc/wayback/cmd/wayback@latest

From GitHub Releases:

curl -fsSL https://get.wabarc.eu.org | sh

or via Bina:

curl -fsSL https://bina.egoist.dev/wabarc/wayback | sh

Using Snapcraft (on GNU/Linux)

sudo snap install wayback

Via APT:

curl -fsSL https://repo.wabarc.eu.org/apt/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/packages.wabarc.gpg
echo "deb [arch=amd64,arm64,armhf signed-by=/usr/share/keyrings/packages.wabarc.gpg] https://repo.wabarc.eu.org/apt/ /" | sudo tee /etc/apt/sources.list.d/wayback.list
sudo apt update
sudo apt install wayback

Via RPM:

sudo rpm --import https://repo.wabarc.eu.org/yum/gpg.key
sudo tee /etc/yum.repos.d/wayback.repo > /dev/null <<EOT
[wayback]
name=Wayback Archiver
baseurl=https://repo.wabarc.eu.org/yum/
enabled=1
gpgcheck=1
gpgkey=https://repo.wabarc.eu.org/yum/gpg.key
EOT

sudo dnf install -y wayback

Via Homebrew:

brew tap wabarc/wayback
brew install wayback

Usage

Command line

$ wayback -h

A command-line tool and daemon service for archiving webpages.

Usage:
  wayback [flags]

Examples:
  wayback https://www.wikipedia.org
  wayback https://www.fsf.org https://www.eff.org
  wayback --ia https://www.fsf.org
  wayback --ia --is -d telegram -t your-telegram-bot-token
  WAYBACK_SLOT=pinata WAYBACK_APIKEY=YOUR-PINATA-APIKEY \
    WAYBACK_SECRET=YOUR-PINATA-SECRET wayback --ip https://www.fsf.org

Flags:
      --chatid string      Telegram channel id
  -c, --config string      Configuration file path, defaults: ./wayback.conf, ~/wayback.conf, /etc/wayback.conf
  -d, --daemon strings     Run as daemon service, supported services are telegram, web, mastodon, twitter, discord, slack, irc, xmpp
      --debug              Enable debug mode (default mode is false)
  -h, --help               help for wayback
      --ia                 Wayback webpages to Internet Archive
      --info               Show application information
      --ip                 Wayback webpages to IPFS
      --ipfs-host string   IPFS daemon host, do not require, unless enable ipfs (default "127.0.0.1")
  -m, --ipfs-mode string   IPFS mode (default "pinner")
  -p, --ipfs-port uint     IPFS daemon port (default 5001)
      --is                 Wayback webpages to Archive Today
      --ph                 Wayback webpages to Telegraph
      --print              Show application configurations
  -t, --token string       Telegram Bot API Token
      --tor                Snapshot webpage via Tor anonymity network
      --tor-key string     The private key for Tor Hidden Service
  -v, --version            version for wayback

Examples

Wayback one or more url to Internet Archive and archive.today:

wayback https://www.wikipedia.org

wayback https://www.fsf.org https://www.eff.org

Wayback url to Internet Archive or archive.today or IPFS:

// Internet Archive
$ wayback --ia https://www.fsf.org

// archive.today
$ wayback --is https://www.fsf.org

// IPFS
$ wayback --ip https://www.fsf.org

For using IPFS, also can specify a pinning service:

$ export WAYBACK_SLOT=pinata
$ export WAYBACK_APIKEY=YOUR-PINATA-APIKEY
$ export WAYBACK_SECRET=YOUR-PINATA-SECRET
$ wayback --ip https://www.fsf.org

// or

$ WAYBACK_SLOT=pinata WAYBACK_APIKEY=YOUR-PINATA-APIKEY \
$ WAYBACK_SECRET=YOUR-PINATA-SECRET wayback --ip https://www.fsf.org

More details about pinning service.

With telegram bot:

wayback --ia --is --ip -d telegram -t your-telegram-bot-token

Publish message to your Telegram channel at the same time:

wayback --ia --is --ip -d telegram -t your-telegram-bot-token --chatid your-telegram-channel-name

Also can run with debug mode:

wayback -d telegram -t YOUR-BOT-TOKEN --debug

Both serve on Telegram and Tor hidden service:

wayback -d telegram -t YOUT-BOT-TOKEN -d web

URLs from file:

wayback url.txt
cat url.txt | wayback

Configuration Parameters

Look at the full list of configuration options.

Deployment

Docker/Podman

docker pull wabarc/wayback
docker run -d wabarc/wayback wayback -d telegram -t YOUR-BOT-TOKEN # without telegram channel
docker run -d wabarc/wayback wayback -d telegram -t YOUR-BOT-TOKEN -c YOUR-CHANNEL-USERNAME # with telegram channel

1-Click Deploy

Deploy Deploy to Render

Screenshots

Click to see screenshots of the services.

Discord

Discord

Web Service

Web

Mastodon

Mastodon

Matrix

Matrix Room

IRC

IRC

Slack

Slack Channel

Telegram

Telegram Bot Telegram Channel

XMPP

XMPP

Contributing

We encourage all contributions to this repository! Open an issue! Or open a Pull Request!

If you're interested in contributing to wayback itself, read our contributing guide to get started.

Note: All interaction here should conform to the Code of Conduct.

License

This software is released under the terms of the GNU General Public License v3.0. See the LICENSE file for details.

FOSSA Status

wayback's People

Contributors

bfagundez avatar dependabot-preview[bot] avatar dependabot[bot] avatar folliehiyuki avatar kezhenxu94 avatar pcgeek86 avatar renovate[bot] avatar waybackarchiver avatar web-flow avatar xiaoxiangxianzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wayback's Issues

Strip more query parameters

Currently, wayback will strip query parameters in URL automatically which containing utm_*, weibo_id, and more query parameters see the helper repository.

This issue is used to collect query parameters that need to be strip. If you have similar query parameters or any suggestions, please add them below.

Tagging system for wayback

Improve experiences of searching archived web pages, a tagging system needs for publishing service (e.g. GitHub Issues, Telegram channel, etc.).

Proposal:

Release the tagging system divide into version 1 and version 2.

For V1, tags should be inputs from the user and extract prepend into the head of text for publishing.

For V2, base on V1, tags also support extract automation.

Pretty print command output

Layout

Using the Style StyleConnectedRounded:

╭─ https://example.com
│  ├─ IA: https://web.archive.org/web/20220101123456/https://example.com
│  ├─ IP: https://ipfs.io/bafabc...def
│  ├─ IS: https://archive.today/abcdf
│  ├─ PH: https://telegra.ph/example-01-01
│  ╰─ Artifacts
│     ├─ /path/to/example.png
│     ├─ ...
│     ╰─ /path/to/example.warc
╰─ https://example.org
   ╰─ ...

Related projects

Got error install from GoBinaries

Currently, installing wayback from GoBinaries via curl -sf https://gobinaries.com/wabarc/ wayback/cmd/wayback | sh will get a failure message, because wayback required Go 1.16.x since version 0.11.0, but GoBinaries is still 1.15.x.

$ curl -sf https://gobinaries.com/wabarc/wayback/cmd/wayback | sh

  ==> Downloading github.com/wabarc/wayback/cmd/wayback@master
  ==> Resolved version master to v0.12.1
  ==> Downloading binary for linux amd64

  Error downloading, got 500 response from server

Waiting tj/gobinaries#39

Expose status interface for users

Columns:

  • running state
  • uptime
  • version
  • archived total
  • rate limit (used/left)
  • etc
  1. For Telegram
  • Add /status command to show service status columns
  • Append default commands to /help command
  1. For Web
  • Add /metrics URI to expose status columns

Handle `publish` in a separate goroutine

When the wayback task is coupled with the publish task, a timeout occurs to some extent. Delegating the publish task to a separate goroutine should work as expected, which makes clear and simplifies the code.

Rate limit

Rate limit for wayback only.

For conversation (Telegram, Discord, and beyond) service:

  • limit by user-id

For httpd service:

  • limit by IP

archive.today is unavailable

Bug Report

Current Behavior
When running wayback command to archive some web page, I got such error message.

html to archive.today failed: archive.today is unavailable.

Environment

  • Wayback version(s): 0.14.1
  • Golang version: go1.14 darwin/amd64
  • OS: macOS 11.2 (20D64)

Possible Solution

Now archive.today redirects to archive.ph. Maybe we should also use domain archive.ph?

Improvement for illegal command

Bug Report

Current Behavior
Send an illegal command to Telegram bot, the results mssing it.

/ is no specified command

Available commands:
/help - Show help information
/metrics - Show service metrics
/playback - Playback archived url

Expected behavior/code

/command is no specified command

Available commands:
/help - Show help information
/metrics - Show service metrics
/playback - Playback archived url

Add support for Discord

Implementations:

Has been supported wayback and playback from Discord with a direct message and mention bot at the channel.

For Discord bot:

A Discord application required bot and applications.commands scope on the OAuth2 - SCOPES, and requires the permission of Send Messages and Attach Files

A Discord bot supports slash commands are:

  1. /help - show help information (configured help text is required)
  2. /metrics - show service metrics (enabled metrics is required)
  3. /playback - playback URLs

Configurations:

There are three environment variables for configuring a Discord daemon service.

  • WAYBACK_DISCORD_TOKEN (required)
  • WAYBACK_DISCORD_CHANNEL
  • WAYBACK_DISCORD_HELPTEXT

Documentation:

Library:

using bina for installations

A small concern about the deps of the project
https://github.com/btcsuite/btcd

is a dep in the packages and I don't see why this is needed by this package

The other reason for this issue is that you could use goblin for installation and I've added support to resolving go packages from github.com into goblin

Here's a screenshot of the installation working
This will be live in the next few hours

Screenshot 2022-03-18 at 10 52 18 PM

Add support for Slack

Implementation:

Steps to create a new bot:

  1. Create an App
  2. Generate an App-Level Token, scopes: connections:write
  3. Enable Socket Mode
  4. Enable Events
    • Subscribe to bot events: app_mention and message.im,
    • Subscribe to events on behalf of users: message.im
  5. Setting OAuth & Permissions User Token Scopes: chat:write, files:write
  6. Install to Workspace, got Bot User OAuth Token
  7. App Home, checked Allow users to send Slash commands and messages from the messages tab

Features:

  • Receive requests from a direct message
  • Receive requests from a channel with mention
  • Publish results to multiple slots
  • Create external links for file sharing (file size up to 5 GB for free plan)
  • Providing external links of files for publishing (related #78 )

Documentation:

Library:

Publish release note to Telegram channel

Append a job to the Release workflow for push notification to Telegram channel using appleboy/telegram-action.

The message should contains repository name, such as #wayback

Steps:

  1. Upload gittaglogs.txt from the release job
  2. Download gittaglogs.txt
  3. Prepend hashtag into gittaglogs.txt
  4. Push notification

The input variables for actions:

  • format: markdown
  • message_file: same as gittaglogs.txt on the release job

Misc:

Create an new bot and add the bot into channel.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.