Giter Site home page Giter Site logo

rustprooflabs / pgosm-flex Goto Github PK

View Code? Open in Web Editor NEW
96.0 96.0 17.0 24.63 MB

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.

License: MIT License

Lua 1.65% PLpgSQL 97.14% Dockerfile 0.02% Shell 0.02% Python 1.08% Makefile 0.08%
openstreetmap osm2pgsql osm2pgsql-flex postgis postgres postgresql spatial-data

pgosm-flex's People

Contributors

jacopofar avatar joshbrooks avatar petrell9 avatar rustprooflabs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pgosm-flex's Issues

Road point table missing

Details

Somehow I missed getting osm.road_point in the road.lua script. Table should end up with turning_loop, milestone, trailheads, etc.

Improve Place layer

Currently only grabbing the exact place tag. I expect this to also have records with boundary = administrative and admin_level = *

Add `innermost` column for nested polygon

Details

Follow up to #37. Initial structure can find the outer boundaries with nest_level = 1. Need innermost column since nesting levels can vary greatly.

This seems to do the trick, need to test on larger scale data before implementing. Will go into the table and index.

SELECT CASE WHEN osm_id_path[array_length(osm_id_path, 1)] = osm_id THEN True
            ELSE False
        END AS innermost,
        *
    FROM osm.place_polygon_nested 
;

Add run-road-and-places option

The run-road-and-places would include 3 sets:

  • road.lua
  • place.lua
  • pgosm_meta.lua

Roads and places are the most common layers I use and would be handy to have easily available. Also would provide a good example to point to of how to create smaller, reduced loads.

Add Address-only objects to Buildings layer

Details

Reported by @thebf.

There are objects in OSM with addr:* tags but no other major tags. Without a building or other tag (shop/amenity/etc) they are simply being skipped. It's a valid use case to capture these. It seems building is appropriate grouping, considering the OSM wiki page on addresses: https://wiki.openstreetmap.org/wiki/Addresses. While some addresses may be simple entrances, POIs, or other, building seems the best catch-all.

Plan

  • Catch objects with any addr:* tag - but WITHOUT more specific tags exist (e.g. amenity, shop, leisure)
  • osm_type = 'address'
  • Update comment to describe the address OSM type since it is non-standard

Questions

  • Points only?
  • Points & Polygons?

Missed relationships in nested place polygons

Details

Noticed in D.C. extract, the suburbs aren't being nested.

Appears the join within the later join of the procedure is to blame.

Current snippet

WHERE ST_Within(p.geom, i.geom)
                    AND i.boundary = p.boundary
                    AND i.name IS NOT NULL

Expected fix

WHERE ST_Within(p.geom, i.geom)
                    AND i.name IS NOT NULL

Verify

Invalid admin_level breaks nested place processing

Details

PgOSM-Flex version: 0.0.6-dev

Processing place.sql fails trying to load data for nested polygon processing. Has invalid text content, OSM Wiki indicates this should not be used.

Error:

ERROR:  invalid input syntax for type integer: "Community"

Example of way causing issue: https://www.openstreetmap.org/way/440611849

Path forward

Currently getting value from simple grab_tag.

local admin_level = object:grab_tag('admin_level')

Should create function in helpers.lua to grab and ensure integer. Consider setting value to -1 to make easy error report?

Restructure Lua/SQL files

As I look at #66 I see the flex-config directory is already a mess with too many files.

Thinking to restructure files into something like this:

./flex-config/
    run-all.lua
    run-all.sql
    lua/
        building.lua
        helpers.lua
        road.lua
    sql/
        building.sql
        helpers.sql
        road.sql

Migrate additional PgOSM styles

Details

Layers with styles possibly with saving/migrating:

  • Landuse
  • Amenity
  • Misc POI (at least the parts in Infrastructure?)
  • What else?

Quality control query - Features not in a standard layer

Details

Thinking to add a db/qc/missing_features.sql. Answer the question: What's missing?

Join osm.tags to each table to find which features do not have matching records. Would not be deployed as part of any database but to make available to help development of this project.

Add region column to meta table

Details

Relates to my thought driving #76, setting up to try out partitioning OSM data by subregion. Use an env var to set region name (e.g. North America, Colorado, etc) in Lua. The Docker script should take the region parts from that run command and use to drive Lua appropriately with no additional action needed from user.

Default to 'Not Specified'

Create POI (Points of Interest) layer

Details

This idea was listed in the legacy PgOSM project's wish list, was not yet implemented.


Thoughts for moving forward:

  • Determine what features to include/exclude
  • Converts Line / Polygon layers to Point (ST_Centroid) (cannot currently achieve in .lua, will require point/line/polygon tables and a view/mv like the buildings layer)
  • PK: osm_id + geom_type
  • What about osm_type/osm_subtype?
  • Provide linkages to source tables in osm schema? <-- Maybe bring up in the future, likely too complicated for v1

Columns for POI feature tables

Beyond the osm_id, geom_type, osm_type, osm_subtype` columns, some ideas:

  • name
  • operator
  • address parts
  • ele <-- Decided to exclude for now, can get from tags table easy enough if needed.

POIs

  • Buildings (only if they have a name, operator, and/or address?)
  • Are really tall buildings by default a landmark? I would assume many have names/operators already, but...?
  • Shops
  • Amenities
  • Leisure
  • Parks
  • Artwork

Not POIs

  • Traffic control (stop sign, lights)
  • Landuse (residential/retail/etc)
  • Trees
  • what else?

Unsure

  • Rural landmarks (Water towers, Grain elevators)
  • Neighborhood / Suburb?
  • ❓ What else??

Error combining LineString and MultiLineString

Details

Layers affected

  • Roads
  • Road Major
  • Public Transport (see comment on #115)
  • Water

Notes

I think what I am doing should work based on osm2pgsql docs: https://osm2pgsql.org/doc/manual.html#geometry-transformations. Documenting the details blocking changes here with local commit refererences, will submit related issue to osm2pgsql project.

Specifically:

{ create = 'line' }. For ways or relations. Create a ‘linestring’ or ‘multilinestring’ geometry.

It works with Polygon/Multipolygon types but does not seem to be working the same on Linestring/Mulitlinestring values.

Version

Originally encountered in tagged 1.4.0 version of osm2pgsql. Built latest master branch to verify hasn't been resolved since.

osm2pgsql --version
2021-01-01 09:29:49  osm2pgsql version 1.4.0 (1.4.0-72-gc3eb0fb6)
Compiled using the following library versions:
Libosmium 2.15.6
Proj [API 4] Rel. 7.1.0, August 1st, 2020
Lua 5.3.3

Error notes

Started relation handling in commit for polygons. Ran into error on lines.

  • 0877b26 - Last working commit w/ Polygon relations. Has line relation handling commented out
  • ac017d3 - Error, not converting LINESTRING to MULTILINESTRING - place.lua
  • a960455 - Error - Confirming roads suffer same fate - road.lua

Note the portion to add LINE place relations is commented out. That commit works.

Removing the block comment from place.lua to allow line relation processing caused error:

ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR:  Geometry type (MultiLineString) does not match column type (LineString)
CONTEXT:  COPY place_line, line 6, column geom: "0105000020110F0000350000000102000020110F00000C00000077DE9C9F0F5C60C18783AC4A94F35141EB9199A40F5C60C1..."

Update table definition in same Lua script to multilinestring should fix:

{ column = 'geom',     type = 'multilinestring' , projection = srid},

Now the error message is flipped.

ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR:  Geometry type (LineString) does not match column type (MultiLineString)
CONTEXT:  COPY place_line, line 1, column geom: "0102000020110F00000A0000000DAF04630C5A60C11BD8F1715AFF5141233BA5E3125A60C1D8D2A9BA48FF514146B4A0C713..."

Add switch to enable building nested polygons

Details

Currently using the Docker script to do this would require manually running this and then manually re-dumping the DB.
Assuming will add runtime option via env var to run this query before running pg_dump.

CALL osm.build_nested_admin_polygons();

Implement High Roads views

Details

Found the HighRoad project from the OSM wiki (here). This looks like an interesting abstraction for a specific purpose, likely worthy of updating and including based on the osm.road_line table.

Notes

Deploy script: https://github.com/migurski/HighRoad/blob/master/high_road_views-setup.pgsql

Note that script has outdated practices no longer applicable in modern PostGIS

  • SRID 900913
  • Manually managing geometry_columns

I have not used the project before, will require some experimentation and exploration to decide if that would be helpful in this project or not.

Add address details to amenity

A good portion of the amenity features have address details. Looking at Colorado, 15% have at least one form of address details. I plan to pull in details similar to building.lua or shop.lua.

SELECT COUNT(*) AS amenity_cnt,
        COUNT(*) FILTER (
            WHERE tags->>'addr:housenumber' IS NOT NULL
                OR tags->>'addr:street' IS NOT NULL
                OR  tags->>'addr:city' IS NOT NULL
                OR  tags->>'addr:state' IS NOT NULL
            ) AS amenity_with_addr
    FROM osm.tags
    WHERE tags ->> 'amenity' IS NOT NULL;
amenity_cnt|amenity_with_addr|
-----------|-----------------|
      81203|            12018|

Improve handling of major roads

Details

When run-all is used the road_major data purely duplicates data in road_line. road_line should calculate a column as a major indicator (with an index).

The road_major.lua script is handy but does not necessarily need to be loaded with run-all.lua. Group it in with unitable as being helpful and worthy to be in the project.

  • Create function in helpers to determine if major road or not
  • Update road.lua to add/calc major column
  • Update road_major.lua to use helper's function to determine if major
  • Update run-all.sql and run-all.lua
  • Update README - mention road_major along with unitable exclusion

Remove env var PGOSM_SCHEMA

Details

I added the PGOSM_SCHEMA option without realizing it would break all the SQL scripts. Currently the docs vaguely warn against doing it, but it should just be removed. If someone wants to change the schema name, it should be done post-import.

ALTER SCHEMA osm RENAME TO your_schema;

Extra space before comma in address column

Noticed during webinar. There is an additional space before the comma after the city name.

Example

SELECT address
    FROM osm_co.shop_point 
    WHERE address <> ''
;

Example

NN 9th Street, Greeley , CO

Add name cleanup to helpers.lua

Details

  • Add function to use logic based on Wiki preference
  • Update all (?) layers to use
  • Add comments to name column to reference data cleanup performed

This will help improve a number of layers and assist implementing #37.

Name tag preferences

From the OSM Wiki under Multiple names:

"If you have multiple names for a feature, first try to choose a rich semantic tag like any of the ones in the table (like short_name=, old_name=, etc.). If none of them works, choose the alt_name=* tag. If there are multiple names that do not fit, alt_name=* can be used with semicolons."

Performance with run-all-tags, run-no-tags, and run-road-place

Details

I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling
Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.

Caveats

The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.

PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.

PgOSM (Legacy) transformed data I never used.

Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!

Versions Tested

  • Ubuntu 20.04
  • osm2pgsql 1.4.0
  • PostgreSQL 13.1
  • PostGIS 3.1.0

Initial observations

North America w/ pgosm-flex run-all.lua took a bit over 3 hours, roughly 3x longer than base comparison. Europe with run-no-tags.lua took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and running pg_dump too.). Following screenshots show the vast majority of the process is stuck using a single CPU.

Screenshots from Processing Europe

From Digital Ocean monitoring dashboard.

load-osm-eu-flex--v0 0 3--CPU

load-osm-eu-flex--v0 0 3--IO

load-osm-eu-flex--v0 0 3--MEM

% disk used based on 400GB total.

Note: The drops at 09:00 and 11:30 were me manually deleting the N.America PBF, and then the osm_na schema, making room. By 13:00 I had a bet with myself that the processing stage would fail by running out of space (neared 100% around 14:30) but it squeaked by. Though, shame on me for not cleaning up my mess from prior testing before starting another round!

load-osm-eu-flex--v0 0 3--DiskUsage

Next steps

Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.

Capture better load date representing data

Details

Currently the .sql script sets the ts value in osm.pgosm_flex table (by alter table ... add column .... If a historic PBF is imported it still gets marked as today, requires manual UPDATE query post-import to fix. Would be better to handle that during the initial processing steps instead of requiring an additional one.

Current best idea

Add check for env var PGOSM_DATE. If not set, today's date would be the default. Easy to implement, low overhead, easy to change later if desired.

Anticipated Steps

  • Add check for env var
  • Move ts creation to Lua
  • Implement default logic when env var doesn't exists
  • What else?

Other approaches considered

Look at the meta-data in the PBF for the latest dates, but files downloaded from Geofabrik public site (for one example) do not contain any timestamps. Technically would be possible from files requiring login, but a) I don't want to do that, and b) this idea would likely add significant overhead to processing in one form or another.

For Geofabrik, could scrape the download page at time of download. There is the text:

This file was last modified 1 day ago and contains all OSM data up to 2021-01-27T21:42:03Z. File size: 10.1 GB; MD5 sum: 1022690e4b7ab6b8529948b1e09d28f7.

Maybe this can be added on later (add on to the ENV VAR idea?) as a bonus. As an initial implementation, it ties this project to Geofabrik downloads. Not good for historic, not good for other sources of OSM data.

Add view for nested polygons

Details

The table osm.place_polygon_nested does not contain geometry. Create view to join back to the osm.place_polygon on osm_id to bring in the geometry.

Specific date should look for file with specific name

When a specific date is used (export PGOSM_DATE='2021-01-27') the docker based process should look for a file with that date in the name, not the -latest typically downloaded.

If the appropriately named file is not available let it fail and notify the user.

Add missing comments

Details

Queries to identify missing.

  • Schema - 100% missing- 1 (of 1)
  • Tables - 0% missing - 0 (of 32)
  • Views - 20% misisng -- 1 (of 5)
  • Columns - 73% missing - 201 (of 277)
SELECT * FROM dd.schemas WHERE s_name =  'osm' AND description IS NULL;

SELECT * FROM dd.tables WHERE s_name =  'osm' AND description IS NULL
    ORDER BY t_name;

SELECT * FROM dd.views WHERE s_name =  'osm' AND description IS NULL
    ORDER BY v_name;


SELECT * 
    FROM dd.columns
    WHERE s_name =  'osm' AND description IS NULL
    ORDER BY t_name, column_name;

Create layer for shops

Details

Mostly shop=* but includes some amenity, e.g. amenity=vending_machine and amenity=car_rental. Also vending=* (in case amenity=vending_machine is missing)

Instructions to get Lua dependencies outside Docker

Hello and thanks for the useful tool!

I'm trying to run the script on my machine (macOS 11.2), and when I start the script it complains about the missing dkjson package. I couldn't find instructions about installing Lua dependencies and don't know anything about this language, are there recommended ways to install the dependencies? dkjson seems to be not mantained anymore

Eventually I did this:

  • run brew install luarocks
  • luarocks install dkjson
  • LUA_PATH="?;?.lua;/usr/local/share/lua/5.4/?.lua;$(pwd)/style?"; osm2pgsql --slim --drop \ --output=flex --style=./run-all.lua [...]

it works, but seems cumbersome and I think I'm missing something

Decide how to document what goes in each layer

Details

PgOSM-Flex to date has comments on tables and columns. What it does not have is a way to easily see (without diving into Lua) how a layer was created. One positive aspect of the old method from PgOSM were the artifacts left behind in the pgosm schema providing self-documentation directly in the DB.

The old way

An example: What are footways?? (a sub-detail of the roads...)

Old PgOSM query.

SELECT code, subclass, osm_tag_filter
    FROM pgosm.layer_detail ld
    INNER JOIN pgosm.layer_group lg ON ld.layer_detail_id = ld.layer_detail_id 
    WHERE subclass = 'footway'
LIMIT 1

Results

code|subclass|osm_tag_filter                        |
----|--------|--------------------------------------|
5153|footway |highway='footway' OR foot='designated'|

New way?

Currently the roads does not deal well with examples such as foot='designated' . It also does not self-document the transformations directly in the DB, it's all in Lua.

An example of restricting records loaded in Lua for road_major.lua.

  -- Only major highways
  if not (object.tags.highway == 'motorway'
          or object.tags.highway == 'motorway_link'
          or object.tags.highway == 'primary'
          or object.tags.highway == 'primary_link'
          or object.tags.highway == 'secondary'
          or object.tags.highway == 'secondary_link'
          or object.tags.highway == 'tertiary'
          or object.tags.highway == 'tertiary_link'
          or object.tags.highway == 'trunk'
          or object.tags.highway == 'trunk_link')
          then
      return
  end

I don't see an easy way to document that accurately in DB w/out manual maintenance.

Create Water/Waterway layers

Details

The natural layer currently has some (but not all) water related details. Planning to remove from natural and move into a water layer.

Tags from natural

These tags should be excluded from natural and included in water.

  • spring
  • lake
  • hot_spring
  • water
  • waterfall
  • wetland
  • swamp
  • water_meadow
  • waterway

Tags from waterway

Include all!

Add roads column / calculation to determine allowable modes of travel

I have traditionally maintained lookup tables as defined in 001.sql to determine which modes of travel are generally acceptable on different classifications of road. I think this would be better done in Lua.

In Lua there is the easy ability to leverage the full range of tags impacting this, e.g. foot=yes/no or cycle=yes/no in combination with footway, cycleway, path, etc. This likely will generate far better results then the simple lookup table method.

Add Docker run options

Details

Add options related to the output from pg_dump. Thinking of ability to easily create versioned data.

  • data_schema_only = False
  • data_schema_name = 'osm'

Expected usage

Setting data_schema_only = True would run pg_dump with --schema=data_schema_name and not include the pgosm schema.

Changing data_schema_name runs ALTER SCHEMA osm RENAME TO data_schema_name; before the pg_dump.

Nested polygons missing features

There are records with boundary IS NULL being excluded now that should be included. RE #37

  • osm_type IN ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')

Change

INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
        COALESCE(p.admin_level::INT, 99) AS admin_level,
        geom
    FROM osm.vplace_polygon p
    WHERE p.boundary = 'administrative'
        AND p.name IS NOT NULL
;

To

INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
        COALESCE(p.admin_level::INT, 99) AS admin_level,
        geom
    FROM osm.vplace_polygon p
    WHERE (p.boundary = 'administrative'
            OR p.osm_type IN   ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')
       )
        AND p.name IS NOT NULL
;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.