Giter Site home page Giter Site logo

pirate / internet-archiving-talk Goto Github PK

View Code? Open in Web Editor NEW
47.0 5.0 5.0 28.22 MB

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

Home Page: https://pirate.github.io/internet-archiving-talk/

JavaScript 98.32% CSS 1.31% HTML 0.37%
internet-archiving talks slideshow web-archiving wget warc archivebox censorship ethics

internet-archiving-talk's Introduction

Archiving the Internet Before it All Rots Away

Slides and video for my talk about Internet Archiving.

Nick Sweeting (Co-Founder @ https://monadical.com) @theSquashSH / @pirate


About

Could you imagine an internet where all links stopped working after 4 years? All the old blogs from the 90’s… gone, all your hot takes on Twitter… gone, all the news and reporting… gone.

Some of that decay is good, no one wants the entire internet to be preserved for eternity, but most of that decay leads to great content disappearing forever, and future generations being deprived of access to the most important medium for knowledge in the last half century. If no one worked on preserving that information, the human race would be facing a loss of knowledge many times greater than the burning of the Library of Alexandria.

Luckily organizations like Archive.org and the Internet Preservation Consortium work tirelessly every day to save what they can. But archiving doesn’t have to be exclusive to big organizations, we can all play a part by archiving the stuff that matters to us locally. Learn about the internet archiving community, the tools of the trade, and how to save content you care about in this talk!

Slides

Video

Rough Outline

  • 2 min: Self intro

    • name, company
    • founded in Colombia
    • poker -> consulting, fully remote in MTL and NYC now
  • 5min: what got me into internet archiving

    • grew up with unreliable internet
    • censored internet
    • hostile environment for journalism and content
    • discovered wget
    • created pocket-archive stream
  • 5min: equifax story

    • equifax breach announced, site launched
    • cloned with pocket-archive-stream
    • rehosted and forgot about it
    • notified of equifax misposts
    • goes viral, 2mil hits
    • only 2nd mention of wget in NYTimes history
  • 5 min: Intro to internet archiving tooling

    • wget is powerful
    • wget has mny options and tunables
    • heres the ones I chose for ArchiveBox
    • demo
  • 5 min: Intro to internet archiving ecosystem

    • Why is preserving information important? why does humanity create libraries and museums?
    • How has it been done so far?
    • what types of archives end up surviving?
    • What are the benefits of decentraliced vs centralized archives?
  • 5 min: Why is internet archiving hard

    • Dynamic and interactive content
    • Private and paywalled content
    • Content ID and discovery, Base32 is hard
    • Dealing with the huge amount of data directly vs curating a smaller amount
    • Archive format longevity tradeoffs (WARC vs html / pdf)
  • 5 min: Setting up a Wikipedia clone

    • Setup Kiwix server
    • Download your collections
    • Create an index and rehost it
  • 1 min: What can you do today to help save the internet?

    • Joining the ArchiveTeam task force & archive.org community
    • Running a local internet archive

Old outline: https://docs.sweeting.me/s/internet-archiving-talk

Resources

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.