rustprooflabs / pgosm-flex Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 17.0 24.63 MB

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.

License: MIT License

Lua 1.65% PLpgSQL 97.14% Dockerfile 0.02% Shell 0.02% Python 1.08% Makefile 0.08%

openstreetmap osm2pgsql osm2pgsql-flex postgis postgres postgresql spatial-data

pgosm-flex's People

Contributors

Stargazers

Watchers

Forkers

andyprasetya kimballjohnson zhuomingliang xenotech81 jacopofar gispocoding joshbrooks koniwatson id10tapproved james-newtracs zkwurst harfang-nft petrell9 datadevopscloud lindangulopez ruhepuls robodrome

pgosm-flex's Issues

Details

Somehow I missed getting osm.road_point in the road.lua script. Table should end up with turning_loop, milestone, trailheads, etc.

Improve Place layer

Currently only grabbing the exact place tag. I expect this to also have records with boundary = administrative and admin_level = *

Add `innermost` column for nested polygon

Details

Follow up to #37. Initial structure can find the outer boundaries with nest_level = 1. Need innermost column since nesting levels can vary greatly.

This seems to do the trick, need to test on larger scale data before implementing. Will go into the table and index.

SELECT CASE WHEN osm_id_path[array_length(osm_id_path, 1)] = osm_id THEN True
            ELSE False
        END AS innermost,
        *
    FROM osm.place_polygon_nested 
;

Add run-road-and-places option

The run-road-and-places would include 3 sets:

road.lua
place.lua
pgosm_meta.lua

Roads and places are the most common layers I use and would be handy to have easily available. Also would provide a good example to point to of how to create smaller, reduced loads.

Typo in comment - km/kr instead of km/hr

Noticed during 3/31 webinar. In comment on maxspeed of road_line. Should check road_major for same issue.

Add Address-only objects to Buildings layer

Details

Reported by @thebf.

There are objects in OSM with addr:* tags but no other major tags. Without a building or other tag (shop/amenity/etc) they are simply being skipped. It's a valid use case to capture these. It seems building is appropriate grouping, considering the OSM wiki page on addresses: https://wiki.openstreetmap.org/wiki/Addresses. While some addresses may be simple entrances, POIs, or other, building seems the best catch-all.

Plan

Catch objects with any addr:* tag - but WITHOUT more specific tags exist (e.g. amenity, shop, leisure)
osm_type = 'address'
Update comment to describe the address OSM type since it is non-standard

Questions

Points only?
Points & Polygons?

Add noexit to traffic layer

The noexit=yes tag is an auxiliary detail, thinking the traffic layer makes most sense. Considered highway layer too, but keep falling back to adding to traffic.

https://wiki.openstreetmap.org/wiki/Key:noexit

Missed relationships in nested place polygons

Details

Noticed in D.C. extract, the suburbs aren't being nested.

Appears the join within the later join of the procedure is to blame.

Current snippet

WHERE ST_Within(p.geom, i.geom)
                    AND i.boundary = p.boundary
                    AND i.name IS NOT NULL

Expected fix

WHERE ST_Within(p.geom, i.geom)
                    AND i.name IS NOT NULL

Verify

Fix detected issues in D.C.
Double check Colorado, compare to https://blog.rustprooflabs.com/2021/01/pgosm-flex-improved-openstreetmap-places-postgis

Create Indoor layers

Details

All where tags -> 'indoor' IS NOT NULL

Invalid admin_level breaks nested place processing

Details

PgOSM-Flex version: 0.0.6-dev

Processing place.sql fails trying to load data for nested polygon processing. Has invalid text content, OSM Wiki indicates this should not be used.

Error:

ERROR:  invalid input syntax for type integer: "Community"

Example of way causing issue: https://www.openstreetmap.org/way/440611849

Path forward

Currently getting value from simple grab_tag.

local admin_level = object:grab_tag('admin_level')

Should create function in helpers.lua to grab and ensure integer. Consider setting value to -1 to make easy error report?

Restructure Lua/SQL files

As I look at #66 I see the flex-config directory is already a mess with too many files.

Thinking to restructure files into something like this:

./flex-config/
    run-all.lua
    run-all.sql
    lua/
        building.lua
        helpers.lua
        road.lua
    sql/
        building.sql
        helpers.sql
        road.sql

Unmaintained dependency - dkjson

Details

The dkjson package required to load the tags to JSONB (instead of HSTORE) is unmaintained (https://github.com/LuaDist/dkjson). Reported on #81, tracking here to avoid losing sight of this.

The dkjson package was suggested by the osm2pgsql project examples (see https://github.com/openstreetmap/osm2pgsql/blob/master/flex-config/places.lua#L5-L7), not sure what other alternatives/options may exist at this time.

Migrate additional PgOSM styles

Details

Layers with styles possibly with saving/migrating:

Landuse
Amenity
Misc POI (at least the parts in Infrastructure?)
What else?

Quality control query - Features not in a standard layer

Details

Thinking to add a db/qc/missing_features.sql. Answer the question: What's missing?

Join osm.tags to each table to find which features do not have matching records. Would not be deployed as part of any database but to make available to help development of this project.

Roads missing layer columns

The layer column is missing, also need bridge and tunnel.

Should update both road.lua and road_major.lua

Add region column to meta table

Details

Relates to my thought driving #76, setting up to try out partitioning OSM data by subregion. Use an env var to set region name (e.g. North America, Colorado, etc) in Lua. The Docker script should take the region parts from that run command and use to drive Lua appropriately with no additional action needed from user.

Default to 'Not Specified'

Create POI (Points of Interest) layer

Details

This idea was listed in the legacy PgOSM project's wish list, was not yet implemented.

Thoughts for moving forward:

Determine what features to include/exclude
Converts Line / Polygon layers to Point (ST_Centroid) (cannot currently achieve in .lua, will require point/line/polygon tables and a view/mv like the buildings layer)
PK: osm_id + geom_type
What about osm_type/osm_subtype?
~~Provide linkages to source tables in osm schema?~~ <-- Maybe bring up in the future, likely too complicated for v1

Columns for POI feature tables

Beyond the osm_id, geom_type, osm_type, osm_subtype` columns, some ideas:

name
operator
address parts
~~ele~~ <-- Decided to exclude for now, can get from tags table easy enough if needed.

POIs

Buildings (only if they have a name, operator, and/or address?)
Are really tall buildings by default a landmark? I would assume many have names/operators already, but...?
Shops
Amenities
Leisure
Parks
Artwork

Not POIs

Traffic control (stop sign, lights)
Landuse (residential/retail/etc)
Trees
what else?

Unsure

Rural landmarks (Water towers, Grain elevators)
Neighborhood / Suburb?
❓ What else??

Add Leisure layer

All features w/ leisure tag.

Improve helpers.get_name() - International support

Identified as an area of improvement during 3/31 webinar via question. Examples:

name:en
name:de

References

See:

Assumption

This improvement will involve an env var to set preferred language. e.g. user can define PGOSM_LANGUAGE=de to grab the name in German if it exists. Would still fall back to existing logic if specific language does not exist.

Add Landuse layer

All features w/ landuse tag.

Table to track PgOSM-Flex version and configurables

Details

Would like to make some meta-data available within the loaded schema.

PgOSM-Flex version
Date loaded
Customized SRID?
maybe others

Combine functions - Meters (probably but not always) to Meters

Currently helpers.lua has parse_ele() and building.lua has parse_building_height(). They do basically the same thing, make a generically named option and use where possible.

Error combining LineString and MultiLineString

Details

Layers affected

Roads
Road Major
Public Transport (see comment on #115)
Water

Notes

I think what I am doing should work based on osm2pgsql docs: https://osm2pgsql.org/doc/manual.html#geometry-transformations. Documenting the details blocking changes here with local commit refererences, will submit related issue to osm2pgsql project.

Specifically:

{ create = 'line' }. For ways or relations. Create a ‘linestring’ or ‘multilinestring’ geometry.

It works with Polygon/Multipolygon types but does not seem to be working the same on Linestring/Mulitlinestring values.

Version

Originally encountered in tagged 1.4.0 version of osm2pgsql. Built latest master branch to verify hasn't been resolved since.

osm2pgsql --version
2021-01-01 09:29:49  osm2pgsql version 1.4.0 (1.4.0-72-gc3eb0fb6)
Compiled using the following library versions:
Libosmium 2.15.6
Proj [API 4] Rel. 7.1.0, August 1st, 2020
Lua 5.3.3

Error notes

Started relation handling in commit for polygons. Ran into error on lines.

0877b26 - Last working commit w/ Polygon relations. Has line relation handling commented out
ac017d3 - Error, not converting LINESTRING to MULTILINESTRING - place.lua
a960455 - Error - Confirming roads suffer same fate - road.lua

Note the portion to add LINE place relations is commented out. That commit works.

Removing the block comment from place.lua to allow line relation processing caused error:

ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR:  Geometry type (MultiLineString) does not match column type (LineString)
CONTEXT:  COPY place_line, line 6, column geom: "0105000020110F0000350000000102000020110F00000C00000077DE9C9F0F5C60C18783AC4A94F35141EB9199A40F5C60C1..."

Update table definition in same Lua script to multilinestring should fix:

{ column = 'geom',     type = 'multilinestring' , projection = srid},

Now the error message is flipped.

ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR:  Geometry type (LineString) does not match column type (MultiLineString)
CONTEXT:  COPY place_line, line 1, column geom: "0102000020110F00000A0000000DAF04630C5A60C11BD8F1715AFF5141233BA5E3125A60C1D8D2A9BA48FF514146B4A0C713..."

Add switch to enable building nested polygons

Details

Currently using the Docker script to do this would require manually running this and then manually re-dumping the DB.
Assuming will add runtime option via env var to run this query before running pg_dump.

CALL osm.build_nested_admin_polygons();

Implement High Roads views

Details

Found the HighRoad project from the OSM wiki (here). This looks like an interesting abstraction for a specific purpose, likely worthy of updating and including based on the osm.road_line table.

Notes

Deploy script: https://github.com/migurski/HighRoad/blob/master/high_road_views-setup.pgsql

Note that script has outdated practices no longer applicable in modern PostGIS

SRID 900913
Manually managing geometry_columns

I have not used the project before, will require some experimentation and exploration to decide if that would be helpful in this project or not.

Add address details to amenity

A good portion of the amenity features have address details. Looking at Colorado, 15% have at least one form of address details. I plan to pull in details similar to building.lua or shop.lua.

SELECT COUNT(*) AS amenity_cnt,
        COUNT(*) FILTER (
            WHERE tags->>'addr:housenumber' IS NOT NULL
                OR tags->>'addr:street' IS NOT NULL
                OR  tags->>'addr:city' IS NOT NULL
                OR  tags->>'addr:state' IS NOT NULL
            ) AS amenity_with_addr
    FROM osm.tags
    WHERE tags ->> 'amenity' IS NOT NULL;

amenity_cnt|amenity_with_addr|
-----------|-----------------|
      81203|            12018|

Improve handling of major roads

Details

When run-all is used the road_major data purely duplicates data in road_line. road_line should calculate a column as a major indicator (with an index).

The road_major.lua script is handy but does not necessarily need to be loaded with run-all.lua. Group it in with unitable as being helpful and worthy to be in the project.

Create function in helpers to determine if major road or not
Update road.lua to add/calc major column
Update road_major.lua to use helper's function to determine if major
Update run-all.sql and run-all.lua
Update README - mention road_major along with unitable exclusion

Remove env var PGOSM_SCHEMA

Details

I added the PGOSM_SCHEMA option without realizing it would break all the SQL scripts. Currently the docs vaguely warn against doing it, but it should just be removed. If someone wants to change the schema name, it should be done post-import.

ALTER SCHEMA osm RENAME TO your_schema;

Docker script - Additional database structures

Details

Script docker/run_pgosm_flex.sh should sqitch deploy ... and load data/roads-us.sql.

Extra space before comma in address column

Noticed during webinar. There is an additional space before the comma after the city name.

Example

SELECT address
    FROM osm_co.shop_point 
    WHERE address <> ''
;

Example

NN 9th Street, Greeley , CO

Amenity tables missing PKs and indexes

Should have PK on osm_id and indexes on osm_type.

Add name cleanup to helpers.lua

Details

Add function to use logic based on Wiki preference
Update all (?) layers to use
Add comments to name column to reference data cleanup performed

This will help improve a number of layers and assist implementing #37.

Name tag preferences

From the OSM Wiki under Multiple names:

"If you have multiple names for a feature, first try to choose a rich semantic tag like any of the ones in the table (like short_name=, old_name=, etc.). If none of them works, choose the alt_name=* tag. If there are multiple names that do not fit, alt_name=* can be used with semicolons."

Performance with run-all-tags, run-no-tags, and run-road-place

Details

I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling
Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.

Caveats

The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.

PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.

PgOSM (Legacy) transformed data I never used.

Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!

Versions Tested

Ubuntu 20.04
osm2pgsql 1.4.0
PostgreSQL 13.1
PostGIS 3.1.0

Initial observations

North America w/ pgosm-flex run-all.lua took a bit over 3 hours, roughly 3x longer than base comparison. Europe with run-no-tags.lua took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and running pg_dump too.). Following screenshots show the vast majority of the process is stuck using a single CPU.

Screenshots from Processing Europe

From Digital Ocean monitoring dashboard.

% disk used based on 400GB total.

Note: The drops at 09:00 and 11:30 were me manually deleting the N.America PBF, and then the osm_na schema, making room. By 13:00 I had a bet with myself that the processing stage would fail by running out of space (neared 100% around 14:30) but it squeaked by. Though, shame on me for not cleaning up my mess from prior testing before starting another round!

Next steps

Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.

Capture better load date representing data

Details

Currently the .sql script sets the ts value in osm.pgosm_flex table (by alter table ... add column .... If a historic PBF is imported it still gets marked as today, requires manual UPDATE query post-import to fix. Would be better to handle that during the initial processing steps instead of requiring an additional one.

Current best idea

Add check for env var PGOSM_DATE. If not set, today's date would be the default. Easy to implement, low overhead, easy to change later if desired.

Anticipated Steps

Add check for env var
Move ts creation to Lua
Implement default logic when env var doesn't exists
What else?

Other approaches considered

Look at the meta-data in the PBF for the latest dates, but files downloaded from Geofabrik public site (for one example) do not contain any timestamps. Technically would be possible from files requiring login, but a) I don't want to do that, and b) this idea would likely add significant overhead to processing in one form or another.

For Geofabrik, could scrape the download page at time of download. There is the text:

This file was last modified 1 day ago and contains all OSM data up to 2021-01-27T21:42:03Z. File size: 10.1 GB; MD5 sum: 1022690e4b7ab6b8529948b1e09d28f7.

Maybe this can be added on later (add on to the ENV VAR idea?) as a bonus. As an initial implementation, it ties this project to Geofabrik downloads. Not good for historic, not good for other sources of OSM data.

Add amenity layer

Add view for nested polygons

Details

The table osm.place_polygon_nested does not contain geometry. Create view to join back to the osm.place_polygon on osm_id to bring in the geometry.

Specific date should look for file with specific name

When a specific date is used (export PGOSM_DATE='2021-01-27') the docker based process should look for a file with that date in the name, not the -latest typically downloaded.

If the appropriately named file is not available let it fail and notify the user.

Setup GitHub actions

I have not done anything with GH actions before. Need to figure out how to set it up to:

Build the Docker image #104
Push to Docker hub on success #104
🚧 Test a known PBF file (D.C.?) #112
Fix #132

Resources

https://github.com/docker/build-push-action
https://github.com/VaultVulp/gp-docker-action

https://github.com/marketplace/actions/docker-run-action

Add view with nested admin boundaries

Idea and original proof of concept SQL queries from @verfriemelt-dot-org

More details coming soon pending ongoing conversation.

Create highway lookup table with class and speed limit

Details

Base new table on PgOSM routable.sql with improvements. https://github.com/rustprooflabs/pgosm/blob/master/db/data/routable.sql

Table definition

Name: pgosm.road

Columns:

osm_type (matches highway=*)
region (default = US)
route_motor (boolean)
route_foot (boolean)
maxspeed (in km/hr)
maxspeed_mph (generated column)

Add missing comments

Details

Queries to identify missing.

Schema - 100% missing- 1 (of 1)
Tables - 0% missing - 0 (of 32)
Views - 20% misisng -- 1 (of 5)
Columns - 73% missing - 201 (of 277)

SELECT * FROM dd.schemas WHERE s_name =  'osm' AND description IS NULL;

SELECT * FROM dd.tables WHERE s_name =  'osm' AND description IS NULL
    ORDER BY t_name;

SELECT * FROM dd.views WHERE s_name =  'osm' AND description IS NULL
    ORDER BY v_name;


SELECT * 
    FROM dd.columns
    WHERE s_name =  'osm' AND description IS NULL
    ORDER BY t_name, column_name;

Create layer for shops

Details

Mostly shop=* but includes some amenity, e.g. amenity=vending_machine and amenity=car_rental. Also vending=* (in case amenity=vending_machine is missing)

Instructions to get Lua dependencies outside Docker

Hello and thanks for the useful tool!

I'm trying to run the script on my machine (macOS 11.2), and when I start the script it complains about the missing dkjson package. I couldn't find instructions about installing Lua dependencies and don't know anything about this language, are there recommended ways to install the dependencies? dkjson seems to be not mantained anymore

Eventually I did this:

run brew install luarocks
luarocks install dkjson
LUA_PATH="?;?.lua;/usr/local/share/lua/5.4/?.lua;$(pwd)/style?"; osm2pgsql --slim --drop \ --output=flex --style=./run-all.lua [...]

it works, but seems cumbersome and I think I'm missing something

Decide how to document what goes in each layer

Details

PgOSM-Flex to date has comments on tables and columns. What it does not have is a way to easily see (without diving into Lua) how a layer was created. One positive aspect of the old method from PgOSM were the artifacts left behind in the pgosm schema providing self-documentation directly in the DB.

The old way

An example: What are footways?? (a sub-detail of the roads...)

Old PgOSM query.

SELECT code, subclass, osm_tag_filter
    FROM pgosm.layer_detail ld
    INNER JOIN pgosm.layer_group lg ON ld.layer_detail_id = ld.layer_detail_id 
    WHERE subclass = 'footway'
LIMIT 1

Results

code|subclass|osm_tag_filter                        |
----|--------|--------------------------------------|
5153|footway |highway='footway' OR foot='designated'|

New way?

Currently the roads does not deal well with examples such as foot='designated' . It also does not self-document the transformations directly in the DB, it's all in Lua.

An example of restricting records loaded in Lua for road_major.lua.

  -- Only major highways
  if not (object.tags.highway == 'motorway'
          or object.tags.highway == 'motorway_link'
          or object.tags.highway == 'primary'
          or object.tags.highway == 'primary_link'
          or object.tags.highway == 'secondary'
          or object.tags.highway == 'secondary_link'
          or object.tags.highway == 'tertiary'
          or object.tags.highway == 'tertiary_link'
          or object.tags.highway == 'trunk'
          or object.tags.highway == 'trunk_link')
          then
      return
  end

I don't see an easy way to document that accurately in DB w/out manual maintenance.

osm_type column missing from osm.vbuilding_all

Customize options via env vars

Initial options to make flexible:

schema name
SRID

Looks like os.get_env() in the helper.lua script should do the trick instead of defining those options per lua script.

Create Water/Waterway layers

Details

The natural layer currently has some (but not all) water related details. Planning to remove from natural and move into a water layer.

Tags from natural

These tags should be excluded from natural and included in water.

spring
lake
hot_spring
water
waterfall
wetland
swamp
water_meadow
waterway

Tags from waterway

Include all!

Add roads column / calculation to determine allowable modes of travel

I have traditionally maintained lookup tables as defined in 001.sql to determine which modes of travel are generally acceptable on different classifications of road. I think this would be better done in Lua.

In Lua there is the easy ability to leverage the full range of tags impacting this, e.g. foot=yes/no or cycle=yes/no in combination with footway, cycleway, path, etc. This likely will generate far better results then the simple lookup table method.

Add Docker run options

Details

Add options related to the output from pg_dump. Thinking of ability to easily create versioned data.

data_schema_only = False
data_schema_name = 'osm'

Expected usage

Setting data_schema_only = True would run pg_dump with --schema=data_schema_name and not include the pgosm schema.

Changing data_schema_name runs ALTER SCHEMA osm RENAME TO data_schema_name; before the pg_dump.

Nested polygons missing features

There are records with boundary IS NULL being excluded now that should be included. RE #37

osm_type IN ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')

Change

INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
        COALESCE(p.admin_level::INT, 99) AS admin_level,
        geom
    FROM osm.vplace_polygon p
    WHERE p.boundary = 'administrative'
        AND p.name IS NOT NULL
;

INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
        COALESCE(p.admin_level::INT, 99) AS admin_level,
        geom
    FROM osm.vplace_polygon p
    WHERE (p.boundary = 'administrative'
            OR p.osm_type IN   ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')
       )
        AND p.name IS NOT NULL
;

rustprooflabs / pgosm-flex Goto Github PK

pgosm-flex's People

Contributors

Stargazers

Watchers

Forkers

pgosm-flex's Issues

Details

Details

Details

Plan

Questions

Details

Current snippet

Expected fix

Verify

Details

Details

Path forward

Details

Details

Details

Details

Details

Columns for POI feature tables

POIs

Not POIs

Unsure

References

Assumption

Details

Details

Layers affected

Notes

Version

Error notes

Details

Details

Notes

Details

Details

Details

Details

Name tag preferences

Details

Caveats

Versions Tested

Initial observations

Screenshots from Processing Europe

Next steps

Details

Current best idea

Anticipated Steps

Other approaches considered

Details

Details

Table definition

Details

Details

Details

The old way

New way?

Details

Tags from natural

Tags from waterway

Details

Expected usage

Recommend Projects

Recommend Topics

Recommend Org