Giter Site home page Giter Site logo

mhucka / devilfish Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 1.0 259 KB

A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.

License: Other

AppleScript 100.00%
archiving devonthink web webarchive pdf

devilfish's Introduction

Devilfish

Devilfish is an archiving utility for use with DEVONthink Pro. It simultaneously creates a single-page PDF snapshot of a web page and a local web archive of the page, and sends requests to external sites such as the Internet Archive.

Author: Michael Hucka
Repository: https://github.com/mhucka/devilfish
License: Unless otherwise noted, this content is licensed under the MIT License license.

☀ Introduction

Web pages are ephemeral—here today, gone (or worse, changed) tomorrow. For researchers, this is anathema: we need to be able to document exactly what we read, when we read it, and potentially prove it at a later time. Currently the best web archiving facilities for general use are sites such as the Internet Archive, WebCite, and Archive.today, but for convenience and rapid access, keeping one's own local archives is a necessity. One of the research tools I use is DEVONthink Pro, a personal database and information management system for macOS, and I needed a convenient way to store not only an archive of a web page but also a page snapshot in PDF format. Devilfish is my solution.

Devilfish is meant to be bound to a keyboard shortcut and invoked while browsing the web in Safari or Google Chrome. When invoked, it does the following:

  • Prompts the user for a destination database in DEVONthink Pro and for a list of tags
  • Calls on DEVONthink Pro to create an archive of the current page in webarchive format
  • Calls on DEVONthink Pro to create a single-page PDF of the current page
  • Optionally, sends requests via network API to the Internet Archive, WebCite, and Archive.today

The web archive is not stored in DEVONthink Pro but rather in a folder in the user's home directory. The PDF is left in DEVONthink; the URL of the web page is stored in the document's URL field, and the PDF is annotated with a Spotlight comment containing the path to the (external) web archive file. This combination avoids duplication and excessive growth in the user's DEVONthink database, while still allowing the user to take advantage of DEVONthink's powerful full-text PDF search, annotation, and other capabilities, and to have a backup copy of the original page source as a precaution. The web archive storage location can be placed on an external drive, or an IPFS location, or other location.

The name "Devilfish" for this software is inspired by loosely combining "DEVONthink" and "fishing", as in fishing for information. (By the way, the real devil fish—more properly known as Mobula mobular or the giant devil ray—is an endangered species due to fishing and habitat destruction. Please read more about them to become more informed and help preservation efforts before they are driven to extinction.)

⁇ Getting help and support

If you find an issue, please submit it in the GitHub issue tracker for this repository.

♬ Contributing — info for developers

I would be happy to receive your help and participation if you are interested. Everyone is asked to read and respect the code of conduct when participating in this project.

🏛 Copyright and license

Copyright (c) 2018 by Michael Hucka and the California Institute of Technology.

☺ Acknowledgments

The image of the illustration of a giant devil ray used on this page came from Wikimedia. It was originally created by H. Gervais for the 1877 book Les Poissons by H. Gervais and R. Boulart.

devilfish's People

Contributors

mhucka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.