rustprooflabs / pgosm-flex Goto Github PK
View Code? Open in Web Editor NEWPgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
License: MIT License
PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
License: MIT License
Somehow I missed getting osm.road_point
in the road.lua
script. Table should end up with turning_loop, milestone, trailheads, etc.
Currently only grabbing the exact place tag. I expect this to also have records with boundary = administrative
and admin_level = *
Follow up to #37. Initial structure can find the outer boundaries with nest_level = 1
. Need innermost
column since nesting levels can vary greatly.
This seems to do the trick, need to test on larger scale data before implementing. Will go into the table and index.
SELECT CASE WHEN osm_id_path[array_length(osm_id_path, 1)] = osm_id THEN True
ELSE False
END AS innermost,
*
FROM osm.place_polygon_nested
;
The run-road-and-places
would include 3 sets:
road.lua
place.lua
pgosm_meta.lua
Roads and places are the most common layers I use and would be handy to have easily available. Also would provide a good example to point to of how to create smaller, reduced loads.
Noticed during 3/31 webinar. In comment on maxspeed
of road_line
. Should check road_major
for same issue.
Reported by @thebf.
There are objects in OSM with addr:*
tags but no other major tags. Without a building
or other tag (shop/amenity/etc) they are simply being skipped. It's a valid use case to capture these. It seems building
is appropriate grouping, considering the OSM wiki page on addresses: https://wiki.openstreetmap.org/wiki/Addresses. While some addresses may be simple entrances, POIs, or other, building seems the best catch-all.
addr:*
tag - but WITHOUT more specific tags exist (e.g. amenity
, shop
, leisure
)osm_type = 'address'
The noexit=yes
tag is an auxiliary detail, thinking the traffic layer makes most sense. Considered highway
layer too, but keep falling back to adding to traffic
.
Noticed in D.C. extract, the suburbs aren't being nested.
Appears the join within the later join of the procedure is to blame.
WHERE ST_Within(p.geom, i.geom)
AND i.boundary = p.boundary
AND i.name IS NOT NULL
WHERE ST_Within(p.geom, i.geom)
AND i.name IS NOT NULL
All where tags -> 'indoor' IS NOT NULL
PgOSM-Flex version: 0.0.6-dev
Processing place.sql
fails trying to load data for nested polygon processing. Has invalid text content, OSM Wiki indicates this should not be used.
Error:
ERROR: invalid input syntax for type integer: "Community"
Example of way causing issue: https://www.openstreetmap.org/way/440611849
Currently getting value from simple grab_tag.
local admin_level = object:grab_tag('admin_level')
Should create function in helpers.lua
to grab and ensure integer. Consider setting value to -1 to make easy error report?
As I look at #66 I see the flex-config
directory is already a mess with too many files.
Thinking to restructure files into something like this:
./flex-config/
run-all.lua
run-all.sql
lua/
building.lua
helpers.lua
road.lua
sql/
building.sql
helpers.sql
road.sql
The dkjson
package required to load the tags
to JSONB (instead of HSTORE) is unmaintained (https://github.com/LuaDist/dkjson). Reported on #81, tracking here to avoid losing sight of this.
The dkjson
package was suggested by the osm2pgsql project examples (see https://github.com/openstreetmap/osm2pgsql/blob/master/flex-config/places.lua#L5-L7), not sure what other alternatives/options may exist at this time.
Layers with styles possibly with saving/migrating:
Thinking to add a db/qc/missing_features.sql
. Answer the question: What's missing?
Join osm.tags
to each table to find which features do not have matching records. Would not be deployed as part of any database but to make available to help development of this project.
The layer
column is missing, also need bridge
and tunnel
.
Should update both road.lua
and road_major.lua
Relates to my thought driving #76, setting up to try out partitioning OSM data by subregion. Use an env var to set region name (e.g. North America, Colorado, etc) in Lua. The Docker script should take the region parts from that run command and use to drive Lua appropriately with no additional action needed from user.
Default to 'Not Specified'
This idea was listed in the legacy PgOSM project's wish list, was not yet implemented.
Thoughts for moving forward:
.lua
, will require point/line/polygon tables and a view/mv like the buildings layer)osm_id
+ geom_type
osm_type
/osm_subtype
?Beyond the osm_id
, geom_type
, osm_type,
osm_subtype` columns, some ideas:
All features w/ leisure tag.
Identified as an area of improvement during 3/31 webinar via question. Examples:
name:en
name:de
See:
This improvement will involve an env var to set preferred language. e.g. user can define PGOSM_LANGUAGE=de
to grab the name in German if it exists. Would still fall back to existing logic if specific language does not exist.
All features w/ landuse tag.
Would like to make some meta-data available within the loaded schema.
Currently helpers.lua
has parse_ele()
and building.lua
has parse_building_height()
. They do basically the same thing, make a generically named option and use where possible.
I think what I am doing should work based on osm2pgsql docs: https://osm2pgsql.org/doc/manual.html#geometry-transformations. Documenting the details blocking changes here with local commit refererences, will submit related issue to osm2pgsql project.
Specifically:
{ create = 'line' }. For ways or relations. Create a ‘linestring’ or ‘multilinestring’ geometry.
It works with Polygon/Multipolygon types but does not seem to be working the same on Linestring/Mulitlinestring values.
Originally encountered in tagged 1.4.0 version of osm2pgsql. Built latest master
branch to verify hasn't been resolved since.
osm2pgsql --version
2021-01-01 09:29:49 osm2pgsql version 1.4.0 (1.4.0-72-gc3eb0fb6)
Compiled using the following library versions:
Libosmium 2.15.6
Proj [API 4] Rel. 7.1.0, August 1st, 2020
Lua 5.3.3
Started relation handling in commit for polygons. Ran into error on lines.
place.lua
road.lua
Note the portion to add LINE
place relations is commented out. That commit works.
Removing the block comment from place.lua
to allow line relation processing caused error:
ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR: Geometry type (MultiLineString) does not match column type (LineString)
CONTEXT: COPY place_line, line 6, column geom: "0105000020110F0000350000000102000020110F00000C00000077DE9C9F0F5C60C18783AC4A94F35141EB9199A40F5C60C1..."
Update table definition in same Lua script to multilinestring should fix:
{ column = 'geom', type = 'multilinestring' , projection = srid},
Now the error message is flipped.
ERROR: DB copy thread failed: Ending COPY mode for 'place_line' failed: ERROR: Geometry type (LineString) does not match column type (MultiLineString)
CONTEXT: COPY place_line, line 1, column geom: "0102000020110F00000A0000000DAF04630C5A60C11BD8F1715AFF5141233BA5E3125A60C1D8D2A9BA48FF514146B4A0C713..."
Currently using the Docker script to do this would require manually running this and then manually re-dumping the DB.
Assuming will add runtime option via env var to run this query before running pg_dump
.
CALL osm.build_nested_admin_polygons();
Found the HighRoad project from the OSM wiki (here). This looks like an interesting abstraction for a specific purpose, likely worthy of updating and including based on the osm.road_line
table.
Deploy script: https://github.com/migurski/HighRoad/blob/master/high_road_views-setup.pgsql
Note that script has outdated practices no longer applicable in modern PostGIS
geometry_columns
I have not used the project before, will require some experimentation and exploration to decide if that would be helpful in this project or not.
A good portion of the amenity features have address details. Looking at Colorado, 15% have at least one form of address details. I plan to pull in details similar to building.lua
or shop.lua
.
SELECT COUNT(*) AS amenity_cnt,
COUNT(*) FILTER (
WHERE tags->>'addr:housenumber' IS NOT NULL
OR tags->>'addr:street' IS NOT NULL
OR tags->>'addr:city' IS NOT NULL
OR tags->>'addr:state' IS NOT NULL
) AS amenity_with_addr
FROM osm.tags
WHERE tags ->> 'amenity' IS NOT NULL;
amenity_cnt|amenity_with_addr|
-----------|-----------------|
81203| 12018|
When run-all is used the road_major
data purely duplicates data in road_line
. road_line
should calculate a column as a major
indicator (with an index).
The road_major.lua
script is handy but does not necessarily need to be loaded with run-all.lua
. Group it in with unitable
as being helpful and worthy to be in the project.
road.lua
to add/calc major
columnroad_major.lua
to use helper's function to determine if majorrun-all.sql
and run-all.lua
I added the PGOSM_SCHEMA
option without realizing it would break all the SQL scripts. Currently the docs vaguely warn against doing it, but it should just be removed. If someone wants to change the schema name, it should be done post-import.
ALTER SCHEMA osm RENAME TO your_schema;
Script docker/run_pgosm_flex.sh
should sqitch deploy ...
and load data/roads-us.sql
.
Noticed during webinar. There is an additional space before the comma after the city name.
Example
SELECT address
FROM osm_co.shop_point
WHERE address <> ''
;
Example
NN 9th Street, Greeley , CO
Should have PK on osm_id
and indexes on osm_type
.
This will help improve a number of layers and assist implementing #37.
From the OSM Wiki under Multiple names:
"If you have multiple names for a feature, first try to choose a rich semantic tag like any of the ones in the table (like short_name=, old_name=, etc.). If none of them works, choose the alt_name=* tag. If there are multiple names that do not fit, alt_name=* can be used with semicolons."
I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling
Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.
The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.
PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.
PgOSM (Legacy) transformed data I never used.
Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!
North America w/ pgosm-flex run-all.lua
took a bit over 3 hours, roughly 3x longer than base comparison. Europe with run-no-tags.lua
took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and running pg_dump
too.). Following screenshots show the vast majority of the process is stuck using a single CPU.
From Digital Ocean monitoring dashboard.
% disk used based on 400GB total.
Note: The drops at 09:00 and 11:30 were me manually deleting the N.America PBF, and then the
osm_na
schema, making room. By 13:00 I had a bet with myself that the processing stage would fail by running out of space (neared 100% around 14:30) but it squeaked by. Though, shame on me for not cleaning up my mess from prior testing before starting another round!
Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.
Currently the .sql
script sets the ts
value in osm.pgosm_flex
table (by alter table ... add column ...
. If a historic PBF is imported it still gets marked as today, requires manual UPDATE
query post-import to fix. Would be better to handle that during the initial processing steps instead of requiring an additional one.
Add check for env var PGOSM_DATE
. If not set, today's date would be the default. Easy to implement, low overhead, easy to change later if desired.
ts
creation to LuaLook at the meta-data in the PBF for the latest dates, but files downloaded from Geofabrik public site (for one example) do not contain any timestamps. Technically would be possible from files requiring login, but a) I don't want to do that, and b) this idea would likely add significant overhead to processing in one form or another.
For Geofabrik, could scrape the download page at time of download. There is the text:
This file was last modified 1 day ago and contains all OSM data up to 2021-01-27T21:42:03Z. File size: 10.1 GB; MD5 sum: 1022690e4b7ab6b8529948b1e09d28f7.
Maybe this can be added on later (add on to the ENV VAR idea?) as a bonus. As an initial implementation, it ties this project to Geofabrik downloads. Not good for historic, not good for other sources of OSM data.
The table osm.place_polygon_nested
does not contain geometry. Create view to join back to the osm.place_polygon
on osm_id
to bring in the geometry.
When a specific date is used (export PGOSM_DATE='2021-01-27'
) the docker based process should look for a file with that date in the name, not the -latest
typically downloaded.
If the appropriately named file is not available let it fail and notify the user.
I have not done anything with GH actions before. Need to figure out how to set it up to:
Resources
https://github.com/docker/build-push-action
https://github.com/VaultVulp/gp-docker-action
Idea and original proof of concept SQL queries from @verfriemelt-dot-org
More details coming soon pending ongoing conversation.
Base new table on PgOSM routable.sql
with improvements. https://github.com/rustprooflabs/pgosm/blob/master/db/data/routable.sql
Name: pgosm.road
Columns:
highway=*
)US
)boolean
)boolean
)Queries to identify missing.
SELECT * FROM dd.schemas WHERE s_name = 'osm' AND description IS NULL;
SELECT * FROM dd.tables WHERE s_name = 'osm' AND description IS NULL
ORDER BY t_name;
SELECT * FROM dd.views WHERE s_name = 'osm' AND description IS NULL
ORDER BY v_name;
SELECT *
FROM dd.columns
WHERE s_name = 'osm' AND description IS NULL
ORDER BY t_name, column_name;
Mostly shop=*
but includes some amenity, e.g. amenity=vending_machine
and amenity=car_rental
. Also vending=*
(in case amenity=vending_machine is missing)
Hello and thanks for the useful tool!
I'm trying to run the script on my machine (macOS 11.2), and when I start the script it complains about the missing dkjson package. I couldn't find instructions about installing Lua dependencies and don't know anything about this language, are there recommended ways to install the dependencies? dkjson seems to be not mantained anymore
Eventually I did this:
brew install luarocks
luarocks install dkjson
LUA_PATH="?;?.lua;/usr/local/share/lua/5.4/?.lua;$(pwd)/style?"; osm2pgsql --slim --drop \ --output=flex --style=./run-all.lua [...]
it works, but seems cumbersome and I think I'm missing something
PgOSM-Flex to date has comments on tables and columns. What it does not have is a way to easily see (without diving into Lua) how a layer was created. One positive aspect of the old method from PgOSM were the artifacts left behind in the pgosm
schema providing self-documentation directly in the DB.
An example: What are footways?? (a sub-detail of the roads...)
Old PgOSM query.
SELECT code, subclass, osm_tag_filter
FROM pgosm.layer_detail ld
INNER JOIN pgosm.layer_group lg ON ld.layer_detail_id = ld.layer_detail_id
WHERE subclass = 'footway'
LIMIT 1
Results
code|subclass|osm_tag_filter |
----|--------|--------------------------------------|
5153|footway |highway='footway' OR foot='designated'|
Currently the roads does not deal well with examples such as foot='designated'
. It also does not self-document the transformations directly in the DB, it's all in Lua.
An example of restricting records loaded in Lua for road_major.lua
.
-- Only major highways
if not (object.tags.highway == 'motorway'
or object.tags.highway == 'motorway_link'
or object.tags.highway == 'primary'
or object.tags.highway == 'primary_link'
or object.tags.highway == 'secondary'
or object.tags.highway == 'secondary_link'
or object.tags.highway == 'tertiary'
or object.tags.highway == 'tertiary_link'
or object.tags.highway == 'trunk'
or object.tags.highway == 'trunk_link')
then
return
end
I don't see an easy way to document that accurately in DB w/out manual maintenance.
Initial options to make flexible:
Looks like os.get_env() in the helper.lua
script should do the trick instead of defining those options per lua script.
The natural layer currently has some (but not all) water related details. Planning to remove from natural
and move into a water
layer.
These tags should be excluded from natural
and included in water
.
Include all!
I have traditionally maintained lookup tables as defined in 001.sql to determine which modes of travel are generally acceptable on different classifications of road. I think this would be better done in Lua.
In Lua there is the easy ability to leverage the full range of tags impacting this, e.g. foot=yes/no
or cycle=yes/no
in combination with footway, cycleway, path, etc. This likely will generate far better results then the simple lookup table method.
Add options related to the output from pg_dump
. Thinking of ability to easily create versioned data.
data_schema_only = False
data_schema_name = 'osm'
Setting data_schema_only = True
would run pg_dump
with --schema=data_schema_name
and not include the pgosm
schema.
Changing data_schema_name
runs ALTER SCHEMA osm RENAME TO data_schema_name;
before the pg_dump
.
There are records with boundary IS NULL
being excluded now that should be included. RE #37
osm_type IN ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')
Change
INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
COALESCE(p.admin_level::INT, 99) AS admin_level,
geom
FROM osm.vplace_polygon p
WHERE p.boundary = 'administrative'
AND p.name IS NOT NULL
;
To
INSERT INTO osm.place_polygon_nested (osm_id, name, osm_type, admin_level, geom)
SELECT p.osm_id, p.name, p.osm_type,
COALESCE(p.admin_level::INT, 99) AS admin_level,
geom
FROM osm.vplace_polygon p
WHERE (p.boundary = 'administrative'
OR p.osm_type IN ('neighborhood', 'city', 'suburb', 'town', 'admin_level', 'locality')
)
AND p.name IS NOT NULL
;
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.