Giter Site home page Giter Site logo

geofurlong / builder Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 97 KB

GeoFurlong dataset builder.

Home Page: https://www.geofurlong.com

License: Other

Go 81.05% Python 15.37% PLpgSQL 3.58%
gb geospatial railway data engineering infrastructure geocode track

builder's Introduction

GeoFurlong

Geospatial resources for mainline (Network Rail) railways in Britain.

Description

This repository contains the source-code for the builder process, which transforms multiple geospatial data sources into optimised datasets for client application use. These output datasets are stored on Dropbox and summarised in the Data Catalogue section below.

The output datasets provide ready-made geocoded data for railway locations, formed of an Engineer's Line Reference (ELR) and associated mileage (or kilometreage) on that line. These precomputed geographic positions can be reverse-geocoded to establish the railway ELR and mileage relative to other features using standard nearest neighbour or point-in-polygon spatial analysis using GIS tools or software libraries.

Sample applications will be shared via separate repositories which utilise the geocoded datasets, dynamic geocoding (at several thousand points per second), and reverse geocoding for mobile applications.

๐ŸŒ geofurlong.com is built using the output of this project, with interactive mapping of routes and junction areas, tabulated output detailing railway attributes, and spatial relationship with populated places and government administrative boundaries.

Preface

GeoFurlong uses a total yards value to define the reported linear distance along the railway, commonly referred to as a mileage, even for kilometre-based ELRs. This is a signed whole number which avoids the pitfalls associated with attempting to store mileage in the many permutations of decimal miles, fractional miles, or text format. These pitfalls are amplified when dealing with negative mileages. The total yards unit is unambiguous, and efficient for sorting, filtering, and storage within systems.

GeoFurlong is opinionated and consistent in its textual presentation of mileages. For example, a mileage of 86 miles, 7 yards is presented as 86M 0007y.

Recording of geographic position is precise to one decimal place for Ordnance Survey Easting / Northing (i.e. 100 mm) and six decimal places for Longitude / Latitude (approximately 110 mm in Britain).

Linear accuracy is defined as the geographic measured distance versus the reported distance, both in metres. For example, if the measured distance between neighbouring quarter mileposts along an ELR centre-line was 403.836 metres, the accuracy would be calculated as +1.5 metres (as a quarter mile being 440 yards, or 402.336 metres). This is an example of what is commonly referred to as a long quarter mile. The linear accuracy, computed to maximum available decimal places, is used to produce the linear calibration statistics per ELR; it is subsequently truncated to a whole number for presentation in other data sets.

The computed geographic position for a defined ELR and mileage may not be accurate in all instances. In a number of locations, the position may be incorrect by a significant linear distance, particularly on closed or partially-closed lines. The manually-maintained ELR dataset (via the remarks column) identifies ELRs which exhibit potentially poor accuracy.

The build process computes the estimated linear position for a given mileage on an ELR by calibrating against mileposts on that ELR. For each ELR, calibration in undertaken using the virtual centre-line geometry, reported start and finish mileages, combined with the milepost position and value. The computed geographic distance along the segment between mileposts are compared against the reported mileages for the mileposts and recorded in a detailed calibration statistics database. This calibration process allows an estimation of the linear accuracy to be provided when geocoding from ELR and Mileage to geographic position.

Noting the linear calibration process described above, inaccuracies in estimating geographic position of a mileage on an ELR can result as a consequence of individual or combined factors which are out with the control of this project, including:

  • ELR: Incorrect geometry; incorrect start / finish reported mileage; remodelled track layout.
  • Milepost: Incorrect position; incorrect identifier; not recorded / physically missing.

โš ๏ธ Given these credible risks to positional accuracy, the GeoFurlong datasets must not be used for safety-critical decisions, nor relied upon as the primary geographic data source in production environments.

Published Data Files

ELR Schema

Column Description Unit / Type Sample
elr ELR text WCM1
l_system Linear Reporting Unit text (M or K) M
shape_length_m Geographic Length metres 135756.658175
total_yards_from Mileage From total yards (whole number) -216
total_yards_to Mileage To total yards (whole number) 148224
route Route text West Coast Main Line (WCML)
section Section text (optional) Carlisle to Law Jn
remarks Remarks text (optional)
quail_book TrackMap Book text (; separated) 1;4;2
grouping Grouping text (; separated) LEC1;LEC2;LEC3; ...
neighbours Neighbours text (; separated) CGJ7;CSP;ECA1; ...

Precomputed Schema

Column Description Unit / Type Sample
elr ELR text ECM1
total_yards Mileage total yards (whole number) 5654
mileage Mileage text 3M 0374y
easting OS Easting metres (1 decimal place) 531412.3
northing OS Northing metres (1 decimal place) 187912.1
longitude Longitude degrees (6 decimal places) -0.105067
latitude Latitude degrees (6 decimal places) 51.574767
osgr OS Grid Reference text TQ3141287912
accuracy Linear Accuracy metres (whole number) -2

Gazetteer Schema

Column Description Unit / Type Sample
elr ELR text ECM1
total_yards Mileage total yards (whole number) 5654
mileage Mileage text 3M 0374y
easting OS Easting metres (1 decimal place) 531412.3
northing OS Northing metres (1 decimal place) 187912.1
longitude Longitude degrees (6 decimal places) -0.105067
latitude Latitude degrees (6 decimal places) 51.574767
osgr OS Grid Reference text TQ3141287912
accuracy Linear Accuracy metres (whole number) -2
nr_region Network Rail Region text Eastern
place_name Nearest Populated Place text Stroud Green
district Nearest Populated Place's District text Haringey
county_unitary Nearest Populated Place's County text Greater London
distance_m Distance to nearest Populated Place metres (whole number) 540
country Country text England
admin_area Administrative Area text Haringey

Aggregated Gazetteer

At the maximum resolution of 22 yards, the gazetteer table consists of over 850,000 entries. An alternative method of establishing the geographic context of the railway positions is made available by grouping the following attributes into a mileage range: Network Rail Region, Government Administrative Area, and nearest Populated Place (and its corresponding County / District) in the following tables, each which have a significantly reduced number of entries:

  • geofurlong_gazetteer_by_nr_region.csv
  • geofurlong_gazetteer_by_country_admin_area.csv
  • geofurlong_gazetteer_by_nearest_place.csv

Data Catalogue

Filename Description Record Count File Size
geofurlong_elr.csv ELR master list 1,589 187.2 KB
geofurlong_elr_metric.csv ELR (metric) 19 92.0 B
geofurlong_precomputed_0022y.csv Geographic positions at 22 yard intervals 884,780 62.4 MB
geofurlong_precomputed_0110y.csv Geographic positions at 110 yard intervals 179,632 12.7 MB
geofurlong_precomputed_0220y.csv Geographic positions at 220 yard intervals 91,536 6.4 MB
geofurlong_precomputed_0440y.csv Geographic positions at 440 yard intervals 47,515 3.3 MB
geofurlong_precomputed_1760y.csv Geographic positions at 1760 yard interval 14,469 1.0 MB
geofurlong_precomputed_8800y.csv Geographic positions at 8800 yard intervals 5,722 407.3 KB
geofurlong_gazetteer_0022y.csv Gazetteer at 22 yard intervals 884,780 73.3 MB
geofurlong_gazetteer_0110y.csv Gazetteer at 110 yard intervals 179,632 14.9 MB
geofurlong_gazetteer_0220y.csv Gazetteer at 220 yard intervals 91,536 7.6 MB
geofurlong_gazetteer_0440y.csv Gazetteer at 440 yard intervals 47,515 3.9 MB
geofurlong_gazetteer_1760y.csv Gazetteer at 1760 yard intervals 14,469 1.2 MB
geofurlong_gazetteer_8800y.csv Gazetteer at 8800 yard intervals 5,722 478.9 KB
geofurlong_gazetteer_by_nr_region.csv Gazetteer by Network Rail region 1,640 76.1 KB
geofurlong_gazetteer_by_country_admin_area.csv Gazetteer by Country and Administrative Area 2,399 129.3 KB
geofurlong_gazetteer_by_nearest_place.csv Gazetteer by Nearest Populated Place (and County / District) 14,507 1.1 MB
geofurlong_gazetteer_aggregated.csv Gazetteer (aggregated) 18,544 1.6 MB
geofurlong_elr_by_country_admin_area.csv ELRs within each Country and Administrative Area 166 13.3 KB
geofurlong_elr_by_nearest_place.csv ELRs with Nearest Populated Place (and County / District) 10,416 361.3 KB
geofurlong_calibration_simplified.csv Linear calibration (simplified) 44,257 2.3 MB
geofurlong_calibration_full.csv Linear calibration (full) 44,257 5.4 MB
geofurlong_calibration_statistics.csv Linear calibration statistics 1,589 379.3 KB

Builder Technical Details

Software Stack

GeoFurlong is primarily developed in the Go programming language, delegating certain input and output geospatial file operations to Python scripts, utilising well-proven libraries. Input data files are in ESRI Shapefile format, intermediate files as comma-separated value (CSV) format, and output files predominantly as SQLite databases (with geometry columns stored in well-known binary [WKB] format).

Process

  • Manual validation / preparation (see below).
  • Conversion of source geospatial to optimised SQLite format: ELRs, Mileposts, Network Rail Regions, Ordnance Survey Administrative Areas, and Ordnance Survey Populated Places.
  • Calibrate mileposts along each ELR centre-line geometry to maximise linear positional accuracy.
  • Build optimised production database of ELRs and associated linear calibration.
  • Precompute geographic positions for all ELRs at multiple yardage intervals: 22, 110, 220, 440, 1760 (one mile), and 8800 (5 miles).
  • Build a gazetteer of railway positions combining Network Railway Region, Ordnance Survey Administrative Area and Populated Place datasets at multiple yardage intervals: 22, 110, 220, 440, 1760 (one mile), and 8800 (5 miles).
  • Build an aggregated gazetteer, based on 22 yard intervals.

Source Data Preparation

The Scottish Region geometry from the Network Rail data source has been identified as being invalid due to it containing a self-intersecting ring. This has been manually corrected prior to the data import phase using QGIS.

Within the Ordnance Survey data source, several populated places are present which share an exact geographic position. These have been assessed manually, then removed prior to the data import phase, as noted in the table below.

Area Deleted Place Retained Place
Abertawe - Swansea Mount Pleasant Clydach
Cornwall Toldish Indian Queens
County Durham Catchgate Annfield Plain
Devon Bishop's Clyst Clyst St Mary
Dorset Dudsbury West Parley
Dorset Pidney Hazelbury Bryan
Dumfries and Galloway Minnigaff Newton Stewart
Fife Town Centre Glenrothes
Gloucestershire South Woodchester Woodchester
Kent Boughton Street Boughton Under Blean
Moray Old Keith Keith
North Lanarkshire Garnqueen Glenboig
North Lanarkshire Wester Auchinloch Auchinloch
Pen-y-bont ar Ogwr - Bridgend Evanstown Gilfach Goch
Somerset Highbury Coleford
Swindon North Wroughton Wroughton
Yorkshire and the Humber Westfield Brampton

Directory Structure

The builder process reads the environment variable GEOFURLONG_ROOT to define the root directory.

Directory Contents
(root) Configuration file
cmd/builder Master builder program
data/cache Serialised cache file
data/gazetteer Railway points combined with location data at multiple intervals
data/import ELR attributes file
data/import/foi_nr Network Rail import files
data/import/foi_os Ordnance Survey import files
data/precomputed Railway geographic locations precomputed at multiple intervals
data/production Database containing ELR centre-lines and calibration data
data/staging Intermediate data files used during build process
pkg/geocode Go support library files
scripts Python and SQL support scripts

Data Usage

For most applications, end-users will likely utilise the files contained in the data/precomputed or data/gazetteer directories, as these contain pre-computed geographic information for regular points (at multiple resolutions) along each ELR. These ready-made tabular files provide simple lookup access to geographic positions for ELRs and mileages without the need for any complex computation.

In addition to these files, developers may use the database in data/production, combined with the Go library files in pkg/geocode for custom applications to compute the geographic position of an ELR and mileage combination dynamically. This library exposes function to establish a point for a single mileage and substring for a mileage range. Client libraries for other programming languages are in progress to integrate with the database in data/production.

Key Definitions

Definition Description
ELR Engineer's Line Reference
NR Network Rail
OS Ordnance Survey

Credits

GeoFurlong is built upon a framework of open-source software applications and libraries, utilising portions of geospatial datasets which have been released under permissive licences.

Data

Software

  • Go programming language.
  • GDAL vector translator library.
  • SQLite database engine library.
  • orb 2D geometry library for Go.
  • go-sqlite3 SQLite database library for Go.
  • go-proj co-ordinate transformation library for Go.
  • Python programming language.
  • pandas data analysis library for Python.
  • GeoPandas and Shapely geospatial data manipulation libraries for Python.
  • Reference has been made to a Python library for manipulation of OS Grid References.
  • QGIS Geographic Information System.

To Do

  • Publish production database (SQLite format).
  • Publish geospatial databases at varying yardage intervals for precomputed and gazetteer tables (GeoPackage format).
  • Publish example mapping for multi-disciplinary and environmental datasets.
  • Publish example reverse-geocoded datasets.
  • Publish example data and video footage of mobile application.

Improved linear positioning accuracy could be obtained by utilising more recent surveyed position of mileposts. Milepost positions are regularly surveyed as a matter of course during topographic surveys on the network. The process of collating and overriding milepost positions to improve the calibration accuracy is currently not within the scope of this project.

Disclaimer

The output data is provided as is, with no warranty of any kind, express or implied.

In no event shall the GeoFurlong author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with this repository or data output.

Licence

All data outputs generated by the GeoFurlong tools are released under a permissive CC BY Creative Commons Licence. This licence allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The licence allows for commercial use, however it should be noted that the GeoFurlong data is aggregated from Network Rail and Ordnance Survey data, both released under Open Government Licences. These contributing licences must be respected.

The project's source code is released under the permissive MIT Licence with a view to benefitting those working in the railway environment and foster further innovation.

Author

Alan Morrison CEng MICE Eur Ing FPWI

builder's People

Contributors

geofurlong avatar

Stargazers

 avatar  avatar

Watchers

Jonathan Moss avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.