Giter Site home page Giter Site logo

Sophox

Installation

Full planet Sophox should be installed on a sufficiently large (40+ GB RAM, 1 TB Disk) server, preferably SSD NVMe disk. In case of Google Cloud, a local SSD scratch disk is also recommended. Use environment variables to override what data gets loaded. See also the Development section below.

The server must have bash, docker, curl, and git. Everything else is loaded inside docker containers.

When cloning, make sure you get submodules (e.g. git submodule update --init --recursive)

Google Cloud

  • Create a custom-6-39936 VM (6 vCPUs, 36 GB RAM) or better with a 15 GB boot disk, and attach a 1 TB Persisted SSD disk.
  • Set VM startup script to the following line, and the service should be ready in two to three days. Insert any env var overrides right before, e.g. export SOPHOX_HOST=example.org; curl ... | bash
curl --silent --show-error --location --compressed https://raw.githubusercontent.com/Sophox/sophox/main/docker/startup.gcp.sh | bash
  • You can view Traefik's dashboard with statistics and configuration at http://localhost:8080 by creating a tunnel to the VM instance (adjust VM name and zone):
$ gcloud compute ssh sophox-instance --zone=us-central1-b  -- -L 8080:localhost:8080
  • To monitor the startup process, ssh into the server and view the startup script output:
sudo journalctl -u google-startup-scripts.service

Hetzner or similar server

We used to have a machine with 12 CPUs, 128 GB RAM, and 1.8 TB SSD.

  • Using robot UI, rescue reboot with a public key, and apply firewall template "Webserver". Reboot.
  • ssh root@<IP>
  • run installimage
  • Choose -ubuntu 18.04
  • In the config file, comment out the 3rd (large) disk, set SWRAIDLEVEL 1, and hit F10. After done formatting, use shutdown -r now to reboot.
  • ssh [email protected]
# Install utils and docker
apt update && apt upgrade
apt-get install -y apt-transport-https ca-certificates curl git software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

# You may need to use "bionic" instead of `lsb_release ...` 
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

apt-cache policy docker-ce
apt update && apt-get install -y docker-ce

# Format and mount the large disk, and make it auto-mount.  We use xfs, but ext4 is fine too.
mkdir -p /mnt/data && mount -o discard,defaults /dev/sdc /mnt/data
echo UUID=`blkid -s UUID -o value /dev/sdc` /mnt/data xfs discard,defaults,nofail 0 2 | tee -a /etc/fstab
  • Install Sophox:
cd /mnt/data
export DATA_DIR=$PWD
export REPO_BRANCH=main
nohup curl --fail --silent --show-error --location --compressed \
   https://raw.githubusercontent.com/Sophox/sophox/${REPO_BRANCH}/docker/startup.planet.sh \
   | bash >> $DATA_DIR/startup.log 2>&1 &

Monitoring

  • See docker statistics: docker stats
  • View docker containers: docker ps
  • See individual docker's log: docker logs <container-id> (ID can be just the first few digits)
  • localhost:8080 shows Traefik's configuration and statistics.

Automated Installation Steps

These steps are done automatically by the startup scripts. Many of the steps create empty status files in the data/status directory, indicating that a specific step is complete. This prevents full rebuild when the server is restarted.

  • Clone/pull Sophox git repo (Use REPO_URL and REPO_BRANCH to override. Set REPO_URL to "-" to disable)* Generate random Postgres password
  • Download OSM dump file and validate md5 sum. (creates status/file.downloaded)
  • Initialize Osmosis state configuration / timestamp (needed for osm2pgsql updates)
  • Start PostgreSQL and Blazegraph with dc-db-*.yml and wait for them to activate
  • Run all dc-importers-*.yml to parse downloaded file into RDF TTL files and into Postgres tables. The TTL files are then imported into Blazegraph. This step runs without the --detach, and should take a few days to complete. Running it a second time should not take any time. Note that if it crashes, you may have to do some manual cleanup steps (e.g. wipe it all clean)
  • Run dc-updaters-*.yml and dc-services-*.yml. Updaters will update OSM data -> PostgreSQL tables (geoshapes), OSM data->Blazegraph, and OSM Wiki->Blazegraph.

GCP has additional disk init step done before startup.sh:

  • If DATA_DEV is set, format and mount it as DATA_DIR. Same applies to the optional TEMP_DEV + TEMP_DIR. (e.g. /dev/sdb as /mnt/disks/data, and /dev/nvme0n1 as /mnt/disks/temp)

Development

Clone the repo with submodules.

If you have commit access to the Sophox repository, make sure to run this in order to automatically use ssh instead of https for submodules.

git config --global url.ssh://[email protected]/.insteadOf https://github.com/

For testing, you may want to create a simple script (example below) in the docker directory, e.g. docker/_belize.sh that uses docker/startup.local.sh to get Sophox locally and with a small OSM file. Use http://sophox.localhost to browse it. You may need to add 127.0.0.1 sophox.localhost to your hosts file. Make sure your script begins with an underscore (ignored by git).

#!/usr/bin/env bash

OSM_FILE=belize-latest.osm.pbf
OSM_FILE_REGION=central-america
MAX_MEMORY_MB=5000

### Uncomment any of these to disable a certain service/feature
# ENABLE_IMPORT_OSM2PGSQL=
# ENABLE_IMPORT_OSM2RDF=
# ENABLE_IMPORT_PAGEVIEWS=
# ENABLE_SVC_PROXY=
# ENABLE_SVC_GUI=
# ENABLE_SVC_MISC=
# ENABLE_UPDATE_METADATA=
# ENABLE_UPDATE_OSM2PGSQL=
# ENABLE_UPDATE_OSM2RDF=
# ENABLE_UPDATE_PAGEVIEWS=
# ENABLE_UPDATE_USAGESTATS=
# ENABLE_UPDATE_MAINTAIN=
# ENABLE_UPDATE_RELLOC=

source "$(dirname "$0")/startup.local.sh"

Notes for Mac users

  • Make sure to set MAX_MEMORY_MB, because free util is not available.

Troubleshooting

Use docker stats and docker logs to monitor the services. Blazegraph Java service is potentially the most problematic as it requires vast amount of RAM/CPU, and does most of the indexing work. Try stopping the containers that use it (various updaters). You may temporarily suspend traefik to prevent new user queries.

Known issues

  • sophox_osm2rdf-update_... service could fall behind updating data from OSM. Try stopping it, waiting for some time for the Blazegraph usage to fall to 0% CPU, and start it again.

sophox's Projects

docker-osmupdater icon docker-osmupdater

A Docker container that includes osmium and osm2pgsql for OpenStreetMap replication

import-osm icon import-osm

Import OpenStreetMap into PostGIS using imposm3

import-sql icon import-sql

Import SQL files in a directory into PostgreSQL and setup vector tile helper functions

osm-lakelines icon osm-lakelines

Calculate nice centered linestrings for labelling OpenStreetMap lakes

postgis icon postgis

A PostgreSQL Docker image with support for GEOS 3.5 and PostGIS 2.2

postserve icon postserve

Use the ST_AsMVT function to render tiles directly in Postgres

sophox icon sophox

A collection of services exposing OSM data, metadata, and other microservices

wikidata-query-gui icon wikidata-query-gui

Github mirror of "wikidata/query/gui" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)

wikidata-query-rdf icon wikidata-query-rdf

Github mirror of "wikidata/query/rdf" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.