Comments (4)
I like what ZeroNet is doing in this space: https://github.com/HelloZeroNet/ZeroNet
from archivebox.
In theory, with enough people running BA we could archive a significant portion of Soundcloud before they go out of business. 😁
from archivebox.
Step 1: merkel tree for identifying and querying archive blobs across a distributed system: https://gist.github.com/pirate/0a3545254615b985727b49bc5c3d99cf
from archivebox.
Blocked by: #74
Once we have a good unique UUID/ULID ID scheme for Snapshots we can begin thinking about how to broadcast that with some metadata to other ArchiveBox instances / endpoints.
Planned baby steps towards this goal in the far-far-future:
- Finalize ArchiveBox
Add
REST API endpoint to allow other services to POST new URLs and snapshots to ArchiveBox - Add functionality for ArchiveBox to announce new snapshots to the world via RSS/webhooks/realtime endpoint of some kind:
- rest webhook support: i.e. add the ability to configure ArchiveBox to ping outside endpoints whenever a new Snapshot/ArchiveResult is created
- RSS feed support, i.e. publish an RSS feed on the ArchiveBox server of all recent snapshots (like Pocket does for your pocket bookmarks)
- Add native ArchiveBox UI support for searching some of these global federation mechanisms on your own instance so that you can browse snapshots from other instances and providers without leaving your one unified UI
External tools could then be developed that injest this feed to publish archivebox content on other platforms, e.g.:
- archivebox RSS -> proof-of-history blockchain e.g. Solana
- archivebox RSS -> bittorrent's magnet DHT and tracker sites
- archivebox RSS -> IFTTT/zapier/slack/zulip/etc. webhooks
Then later we can add functionality for ArchiveBox to publish snapshots/metadata to global lookup systems like proof-of-history blockchains (e.g. Solana), DHT's like bittorrent's magent system uses, distributed filesystems like IPFS, etc.
from archivebox.
Related Issues (20)
- New Extractor Idea: `podcast-archiver` for auto-downloading podcasts
- Django Admin general improvements: tree view, better filters, better sorting, custom pages, etc.
- Feature Request: Raindrop.io import HOT 1
- htmltotext archive results are not recorded HOT 1
- parser=auto will almost always just fall back to parser=generic_txt, needs to let the first parser to find URLS win HOT 7
- Feature Request: Add config to show Snapshot.bookmarked timestamp instead of Snapshot.added in the UI
- New Extractor Idea: `forum-dl` for downloading forum threads as JSON/html HOT 1
- Feature Request: Add new `generic_jsonl` parser to support ingesting JSONL HOT 3
- Bug: `UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 110372: surrogates not allowed` when trying to render unprintable filesystem path in view HOT 15
- How to navigate various snapshots of a single url? HOT 2
- Support: podman-compose rootless setup leads to `PUID=0` being passed, and ArchiveBox refuses to start as root HOT 9
- Ability to disable archiving if not logged in HOT 3
- Support: Singlefile is failing to archive some sites (`xz.aliyun.com`) HOT 1
- Bug: Bilibili fails to scrape
- Support: singlefile & readability fail to work HOT 3
- Bug: Enter a valid URL. HOT 2
- Bug: AttributeError: 'PosixPath' object has no attribute 'split' / ImportError: attempted relative import beyond top-level package HOT 7
- New Feature: Provide deeper `mitmproxy` integration out-of-the-box in Docker HOT 1
- Bug: upgrading Docker image from 0.7.2 to 0.7.4 - The 0.7.4 version doesn't work HOT 3
- a bug of urllib.parse.urljoin HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from archivebox.