Giter Site home page Giter Site logo

awesome-datahoarding's Introduction

Note: This is only a first draft/brainstorm. I will try to organize the list with more useful sections in the future
Feel free to contribute!

Download utilities

Web Archiving

  • Collect: A server to collect & archive websites that also supports video downloads
  • grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
  • Heritrix: Extensible, web-scale, archival-quality web crawler
  • HTTrack: Download a website from the Internet to a local directory
  • wail: Web Archiving Integration Layer: One-Click User Instigated Preservation
  • wikiteam: set of tools for archiving wikis

General

  • annie: Youtube-DL alternative written in Golang
  • aria2: A lightweight multi-protocol & multi-source command-line download utility
  • CrowLeer: Powerful C++ web crawler based on libcurl
  • curl: Tool and library for transferring data with URL syntax, supporting many protocols
  • Plowshare: Command-line tool to manage file-sharing site
  • Rclone: A command line program to sync files and directories to and from various cloud storage providers
  • wget: Utility for non-interactive download of files from the Web.
  • you-get: Dumb downloader that scrapes the web
  • Youtube-DL: A command-line program to download videos from YouTube and a few hundred more sites

Application-specific

  • ChanThreadWatch: Saves threads from *chan-style boards and checks for updates until the thread dies
  • floatplane_ripper: Script to rip all videos from https://floatplane.rip/
  • gallery-dl: Download image galleries and collections from pixiv, exhentai, danbooru and more
  • dzi-dl: Deep Zoom Image Downloader
  • FanFicFare: Tool for making eBooks from stories on fanfiction and other web sites
  • FicSave: Online fanfiction downloader
  • Google Images Download: Python script for downloading images
  • iiif-dl: Command-line tile downloader/assembler for IIIF endpoints/manifests
  • Instagram Scraper: Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly.
  • PyInstaLive: Instagram live stream downloader.
  • RedditDownloader: Scrapes Reddit to download media of your choice
  • Scribd-Downloader: Allows downloading of Scribd documents
  • RipMe: RipMe is an album ripper for various websites. Runs on your computer. Requires Java 8.
  • yt-mango: Youtube metadata archiver the Web (HTTP & FTP)
  • Youtube-MA: Youtube metadata archiver

Download automation

  • bazarr: Companion application to Sonarr and Radarr for downloading subtitles
  • FlexGet: Multipurpose automation tool for content like torrents, nzbs, podcasts, comics, series, movies, etc
  • Jackett: API support for torrent trackers (works with Sonarr, Radarr and others)
  • Lidarr: Music collection manager for Usenet and BitTorrent users
  • Mylar: An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
  • Sick-Beard: PVR for newsgroup users (with limited torrent support)
  • Radarr: A fork of Sonarr to work with movies ร  la Couchpotato
  • Sonarr: PVR for Usenet and BitTorrent users

Compression

  • KGB Archiver: compression tool with unbelievable high compression rate
  • peazip: File archiver utility
  • PIGZ: Multi-threaded gzip
  • WinRAR: Can decompress RAR and zip files.

Network

  • NetLimiter: Internet traffic control and monitoring tool for Windows

File systems

File conversion

  • AAXtoMP3: convert AAX files to common MP3, M4A, M4B, flac and ogg formats through a basic bash script frontend to FFMPEG
  • html2warc: Convert web resources to a single warc file

Utility Scripts

Content sharing

  • h5ai: HTTP web server index for Apache httpd, lighttpd, nginx and Cherokee
  • ipfs: Protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system
  • opds: Easy to use, Open & Decentralized Content Distribution

Data curation

  • baobab: Graphical disk usage analyzer
  • beets: Music library manager and MusicBrainz tagger
  • Calibre: Ebook manager
  • DeepSort: AI powered image tagger backed by DeepDetect
  • diskover: File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
  • Everything: Locate files and folders by name instantly (Windows)
  • FileBot: FileBot is the ultimate tool for organizing and renaming your Movies, TV Shows and Anime
  • fucking-weeb: A library manager for animu (and TV shows, and whatever).
  • grepWin: A powerful and fast search tool using regular expressions (Windows)
  • jdupes: Powerful duplicate file finder
  • MediaElch: Media manager for Kodi
  • MediaInfo: Convenient unified display of the most relevant technical and tag data for video and audio files
  • Mp3tag: Powerful and easy-to-use tool to edit metadata of audio files (Windows/Mac)
  • phockup: Media sorting tool to organize photos and videos from your camera
  • picard: MusicBrainz tagger
  • TeraCopy: Copy your files faster and more securely
  • tree: 'tree' command for linux
  • WinDirStat: Disk usage statistics viewer and cleanup tool for Windows
  • SyncToy: Microsoft windows file parity across locations tool

APIs & Online tools

  • iqdb: Multi-service reverse image search
  • thetvdb: TV shows metadata (used by plex)

Hardware / Monitoring

  • CrystalDiskInfo: A HDD/SSD utility software which supports a part of USB, Intel RAID and NVMe.
  • Hard Drive Sentinel: Multi-OS SSD and HDD monitoring and analysis software
  • smartmontools: Control and monitor storage systems using the (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Data recovery

  • PhotoRec FOSS powerful gui data recovery tool.
  • TestDisk Another FOSS tool by the author of PhotoRec, but this one is for cli

awesome-datahoarding's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.