Light

Internet Archive photo

internetarchive Goto Github PK

repos: 235.0 gists: 0.0

Name: Internet Archive

Type: Organization

Bio: The Internet Archive is "the library of the Internet", and a big supporter of Free Software.

Twitter: internetarchive

Location: San Francisco

Blog: https://archive.org/

Internet Archive's Projects

acs4_py

Python interface to ACS4

ads-common

Common components and utilities for the Archiving & Data Services (ADS) team at the Internet Archive

analyze_ocr

Parse OCR result files for pagenos, tables of contents, etc.

annotate-client

The Hypothesis web-based annotation client.

annotate-client-build

This is used to store and update just the build directory of annotate-client.

annotate-pdf.js

PDF.js + Hypothesis viewer / annotator

arch

Web application for distributed compute analysis of Archive-It web archive collections.

archive-analysis

Tools to analyze web archives

archive-commons

archive-hocr-tools

Efficient hOCR tooling

archive-ocr-tools

archive-pdf-tools

Fast PDF generation and compression. Deals with millions of pages daily.

archive-wcd-etc-files

archiveorg-e2e-playwright

archiveorg-e2e-tests

archive.org e2e tests

archivespark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

arklet

ARK minter, binder, resolver

aspectmock

The most powerful and flexible mocking framework for PHPUnit / Codeception.

autocrop

This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.

bacon

Experimenting with Apache Pig.

bookreader

The Internet Archive BookReader

bookserver

Archive.org OPDS Bookserver - A standard for digital book distribution

brozzler

brozzler - distributed browser-based web crawler

btget

Command line retrieval of torrents using transmission-daemon (via transmission-remote)

cdx-summary

Summarize web archive capture index (CDX) files.

cdx-writer

Python script to create CDX index files of WARC data

certstream-go

Go library for connecting to CertStream

cgraphbot

Wikibase bot for updating identifiers and citation relationships

chocula

journal-level metadata munging. part of fatcat project

chromenomore404s

1
2
3
4
5
6
7
8

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.