Giter Site home page Giter Site logo

keres's Introduction

Keres

Keres is a simple and quick all-in-one dockerized solution for harvesting websites. Keres is comprised of:

Keres provides some default configurations so you can jump into harvesting immediately without having to worry about users, permissions, templates, profiles, bandwiths, etc. Simply log in to Web Curator Tool and start harvesting.

If you want to use the default configuration and DB which ships with WCT, set the DB_DEFAULT to default in the .env file.

Setup

  1. Clone this repo and enter the directory
  2. Copy the sample.env file to .env
  3. Check out the .env file and change parameters as you wish
  4. If you don't already have it, install the latest version of docker
  5. If you don't already have it, install the latest version of docker-compose
  6. Follow the Docker post-install steps if you're not using the root user with Docker
  7. Run docker-compose up -d
  8. Wait a bit for all services to start up

Services are available at:

Service Address
WCT http://localhost:8080/wct
Heritrix https://localhost:8443/engine
pywb http://localhost:9080

Harvesting

Log in to WCT

Login credentials for out-of-the-box WCT user are:

Username: bootstrap
Password: XCgsDrQCHgAck0Wg

You can change the default password in the WCT UI after login if you wish or you can add new users and disable the bootstrap user.

Configure harvest settings

Keres ships with a custom fine-tuned profile. Alternatevily, you can create a new profile which will then use the default Heritrix 3 settings.

NOTE: It is advised to change the metadata.operatorContactUrl field in the Heritrix profile after installation to reflect your real contact URL.

Run a harvest

To start harvesting, you can jump into Targets immediately. See this documentation on how to start a harvest. After a target instance is harvested, you can review it with "Review this harvest", an internal WCT playback tool. If you are satisfied with the harvest, endorse it and submit it to archive, after which it will be available in pywb.

Pywb looks for harvests to index every 30 seconds. If you've made a large harvest in WCT, it might take a while for it to be indexed in pywb, but generally it's very quick.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.