Giter Site home page Giter Site logo

ad-freiburg / osm2rdf Goto Github PK

View Code? Open in Web Editor NEW
18.0 7.0 5.0 1.21 MB

Convert OpenStreetMap (OSM) data to RDF Turtle, including the object geometries and predicates geo:sfContains and geo:sfIntersects. Weekly updated downloads for the whole planet (~ 40 billion triples) and per country.

Home Page: https://osm2rdf.cs.uni-freiburg.de

License: GNU General Public License v3.0

CMake 3.25% C++ 95.36% Dockerfile 0.03% Makefile 0.49% Python 0.87%
openstreetmap osm turtle rdf-triples n-triples

osm2rdf's People

Contributors

hannahbast avatar lehmann-4178656ch avatar lorenzbuehmann avatar patrickbr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osm2rdf's Issues

Outdated RDF dumps

On page https://osm2rdf.cs.uni-freiburg.de/ is being stated that RDF dumps are updated weekly, but HTTP headers shows that the last dump was made on Thu, 06 Jan 2022 19:07:37 GMT.

curl -I https://osm2rdf.cs.uni-freiburg.de/ttl/cze.osm.ttl.bz2
HTTP/1.1 200 OK
Date: Sat, 09 Apr 2022 20:20:10 GMT
Server: nginx/1.18.0 (Ubuntu)
Content-Type: application/octet-stream
Content-Length: 1986104540
Last-Modified: Thu, 06 Jan 2022 19:07:37 GMT
ETag: "61d73df9-76618cdc"
Expires: Sat, 09 Apr 2022 21:20:10 GMT
Cache-Control: max-age=3600
Accept-Ranges: bytes

Same applies to other countries.

Do you think that it will be possible to make a fresh dump or at least remove the statement about updates? I was quite confused at first sight when I uploaded the dump to the triple store and did not see any new data.

Thank you!

Docker image creation always runs benchmark

Hi,

created the Docker image from source, and so running benchmarks which from my point of view should be unnecessary. Looking at the Dockerfile I can see that just make is called and make all does indeed triggers the benchmark computation.

Not sure if this is intended, but it makes the image creation unnecessary slower then needed.

Convert OpenStreetMap wikimedia_commons tags to IRIs

In OpenStreetMap and OpenHistoricalMap, the wikimedia_commons key can be set to a page name on Wikimedia Commons. The page is typically a file description page in the File: namespace, but sometimes it’s a category in the Category: namespace or a gallery in the main namespace instead. It would be convenient to have osm:wikimedia_commons triples that point to sdc: entities for files. Categories and galleries are usually linked to Wikidata items in the same manner as Wikipedia articles, so I suppose there would be schema:about triples for those.

One use case is to associate map features with Commons images for a better sense of context. OSM-based applications like OsmAnd can fetch the image and associated licensing information using direct MediaWiki API calls because they’re working with a single map feature at a time. OpenHistoricalMap/issues#581 tracks something similar that would be built into the OHM website. But there’s also some value in being able to query for linked images en masse. For example, a query could return map features whose wikimedia_commons tag:

  • Refers to a nonexistent (possibly deleted) image
  • Refers to an image that depicts something completely different than the map feature or a photo that (in the context of OHM) was taken after the feature ceased to exist
  • Refers to a photo that was taken too long ago and should be replaced by something more recent
  • Refers to a photo that remains copyrighted locally, even though it’s in the public domain in the U.S., where Wikimedia Commons is hosted

Do we need FOUR separate osm2rdf-specific prefixes

There are currently four different osm2rdf-specific prefixes, namely:

@prefix osm2rdf: <https://osm2rdf.cs.uni-freiburg.de/rdf#> .
@prefix osm2rdfkey: <https://osm2rdf.cs.uni-freiburg.de/rdf/key#> .
@prefix osm2rdfgeom: <https://osm2rdf.cs.uni-freiburg.de/rdf/geom#> .
@prefix osm2rdfmember: <https://osm2rdf.cs.uni-freiburg.de/rdf/member#> .

Try it on Qlever

Is this useful or good for anything? At first glance, I find it unnecessarily complex + we have four not so nice prefixes then. The suggestion would be to just have the first one.

@patrickbr and @lehmann-4178656ch What do you think?

Write RDF stats

Add stats for each RDF-File:

E.g.

{
    "lines": 1337,
    "header": 42,
    "triples": 1295
}

This should be placed in a .stats.json file named like the normal output.

cmake --install doesn't work

After building with cmake --build . --target osm2rdf I've wanted to properly install osm2rdf:

-- Install configuration: "Release"
CMake Error at benchmarks/vendor/google/benchmark/src/cmake_install.cmake:46 (file):
  file INSTALL cannot find
  "./osm2rdf/build/benchmarks/vendor/google/benchmark/src/libbenchmark.a":
  No such file or directory.

Add inner and outer members of boundary and multipolygon relations

This query shows that no boundary or multipolygon relation in the OSM Planet dataset has osmrel:members that are ways with the role inner or outer. The only members in the dataset are label and admin_centre nodes, subarea relations, and plenty of tagging mistakes. This makes it difficult to perform tasks such as:

  • Comparing the perimeter of a building that has a courtyard to the perimeter (P2547) property on Wikidata
  • Computing the perimeter of a boundary, for example to apply the Poslby–Popper compactness test to the boundary
  • Associating a disputed boundary claim line with a boundary relation
  • Finding murals on walls of buildings that have courtyards

Also, in this OSM discussion, I needed to access the ways that make up a boundary relation in order to determine the total set of ways that would be part of a proposed time zone relation. I had to drop down to Overpass, which has various recursing operators as well as a length() operator.

osm2rdf produces invalid value literals like "A1"^^xsd:integer

The current version of osm2rdf produces triples like this one (from https://download.geofabrik.de/asia/india-latest.osm.pbf):

osmnode:458972632 osmkey:admin_level "A1"^^xsd:integer .

Obviously, "A1" is not an integer, and the latest version of QLever no longer accepts such value literals (which is also the standard behavior). The obvious fix would be to check whether a string is really an integer, and if not, omit the type. The triple above would then become:

osmnode:458972632 osmkey:admin_level "A1" .

Docker build errors on M2 chip mac

Following error pops up running make docker-de command on M2 mac with Ventura 13.5.

error: the clang compiler does not support '-march=native' [clang-diagnostic-error]

To fix, add --platform linux/amd64 to docker-build command in Makefile:

docker-build:
	${DOCKER} build --platform linux/amd64 -t osm2rdf .

*wget should be added as a dependency to be able to build.

Problem with building OSM Europe with inner-outer-geoms

The run ended in the middle of constructing the non-reduced DAG, without an error message or anything. The following part of the log makes it looks as if the program terminated normally, but of course it didn't. What's happening here?

bast@lena:osm-europe$ qlever download-data 

This is the "qlever" script, call without argument for help

Executing "download-data":

wget -nc -O osm-europe.pbf https://download.geofabrik.de/europe-latest.osm.pbf; time ( /local/data/osm2rdf/build/apps/osm2rdf osm-europe.pbf -o osm-europe.ttl --cache . --store-locations-on-disk --split-tag-key-by-semicolon ref | tee osm-europe.osm2rdf-log.txt )

Downloading data using DOWNLOAD_CMD from Qleverfile ...

File ‘osm-europe.pbf’ already there; not retrieving.
[2022-06-26 05:40:30.901] osm2rdf :: 75f47d4-dirty :: BEGIN
                          Config
                          --- I/O ---
                          Input:         "osm-europe.pbf"
                          Output:        "osm-europe.ttl.bz2"
                          Output format: qlever
                          Cache:         "/local/data/qlever/qlever-indices/osm-europe/."
                          --- Facts ---
                          Simplifying WKT
                          Simplifying wkt geometries with factor: 5
                          Dumping WKT with precision: 7
                          Tag-Keys split by semicolon: 
                                                    ref
                          --- Contains ---
                          --- Miscellaneous ---
                          Storing locations osmium locations on disk
                          --- OpenMP ---
                          Max Threads: 16
[2022-06-26 05:40:30.902] Free ram: 0.783966G/125.803G

[2022-06-26 05:40:30.903] OSM Pass 1 ... (Relations for areas)
[2022-06-26 05:41:27.682] ... done                                     ]   0% 
[======================================================================] 100% 

[2022-06-26 05:41:27.725] OSM Pass 2 ... (dump)
[======================================================================] 100% 
[2022-06-26 07:16:33.199] ... done reading (libosmium) and converting (libosmium -> osm2rdf)
[2022-06-26 07:16:33.199] areas seen:258692004 dumped: 258692004 geometry: 258692004
                          nodes seen:2984713721 dumped: 107927718 geometry: 107927718
                          relations seen:6183874 dumped: 6183296 geometry: 0
                          ways seen:361552080 dumped: 354195428 geometry: 354195428

[2022-06-26 07:16:33.199] Calculating contains relation ...

[2022-06-26 07:16:33.206]  Sorting 6000770 areas ... 
[2022-06-26 07:16:34.358]  ... done 
[2022-06-26 07:16:34.358]  Packing area r-tree with 6000770 entries ... 
[2022-06-26 07:16:41.242]  ... done
[2022-06-26 07:16:42.474]  Generating non-reduced DAG from 6000770 areas ... 
[=====================================>                 ]  68% [4136030/6000770]
real	270m46.969s
user	3934m0.286s
sys	218m14.121s

Data model doesn't exactly fit GeoSPARQL standard

Hi,

used the converter and according to what I saw in the output, the generated data doesn't match the GeoSPARQL standard, see https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_core

In particular I mean that geo:hasGeometry should only point to a geometry and not to the serialization directly. I know it might be annoying and it will introduce an additional triple per geometry, but this is just what the standard says.

Without this I guess some of the standard triple stores won't be able to make use of the geospatial data, e.g. for indexing I think.

If not already done, maybe you could mention this somewhere in your documentation at least?

Cheers,
Lorenz

Better GeoSPARQL conformity

We should be fully conform with the GeoSPARQL standards for types geo:SpatialObject and geo:Feature.

In particular, a geo:Feature must have the following properties: geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox

That is, osm:Nodes, osm:Ways, osm:Relations and osm:Areas should be of type geo:Feature and offer these properties.

As far as I understand it, all of the properties geo:hasGeometry, geo:hasDefaultGeometry, geo:hasCentroid and geo:hasBoundingBox must then point to an object of type geo:SpatialObject. These must implement geo:hasSize, geo:hasMetricSize, geo:hasLength, geo:hasMetricLength, geo:hasPerimeterLength, geo:hasMetricPerimeterLength, geo:hasArea, geo:hasMetricArea, geo:hasVolume and geo:hasMetricVolume.

So far, I don't see any problem with implementing this.

However, AFAIK (@lehmann-4178656ch, @Danysan1, please correct me) , sfIntersects and sfContains should be properties between geo:SpatialObjects. This would mean that we cannot write queries like

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 ogc:sfContains ?osm_id .
  ?osm_id geo:hasGeometry/geo:asWKT ?hasgeometry 
}

anymore. They would then look like this:

SELECT ?osm_id ?hasgeometry WHERE {
  osmrel:1960198 geo:hasGeometry ?geoma .
  ?osm_id geo:hasGeometry ?geomb .
  ?geoma ogc:sfContains ?geomb .
  ?geomb geo:hasGeometry/geo:asWKT ?hasgeometry 
}

@hannahbast, @joka921, is that a problem?

See also ad-freiburg/qlever#678 (comment)

Way geometry intermediate objects in wrong namespace

The intermediate geometry objects to which the geo:asWKT predicates are attached should be in the NAMESPACE__OSM2RDF_GEOM namespace (osm2rdfgeom: in the generated .ttl). For ways, these are in the NAMESPACE__OSM2RDF namespace (osm2rdf: in the generated .ttl). See

NAMESPACE__OSM2RDF, "way_" + std::to_string(way.id()));

I might be wrong, but shouldn't https://github.com/ad-freiburg/osm2rdf/blob/80daa32e066ba5091a36ed078d37228662cbe7e7/src/osm/FactHandler.cpp#L245C13-L245C36 also be in NAMESPACE__OSM2RDF_GEOM?

Export contains invalid lines

Exported file https://osm2rdf.cs.uni-freiburg.de/ttl/cze.osm.ttl.bz2 (2022-03-06)
does contain invalid lines, which results (during full import to Virtuoso) to the following errors.

Message: TURTLE RDF loader, line 33407053: RDFGE: RDF box with a geometry RDF type and a non-geometry content

This is probably due to empty values inside MULTIPOLYGON function as can be seen in the following example.

PREFIX osmrel: <https://www.openstreetmap.org/relation/> 
PREFIX geo: <http://www.opengis.net/ont/geosparql#> 

INSERT DATA {
    GRAPH <http://example.com/osm/cze> {
        # This line is contained in TTL export
        osmrel:8291361 geo:hasGeometry "MULTIPOLYGON()"^^geo:wktLiteral . 
    }
}

Simple cat + grep expression shows that exported file contains 4 invalid rows, but maybe more will occur.

$ cat cze.osm.ttl | grep 'MULTIPOLYGON()'
osmrel:8291361 geo:hasGeometry "MULTIPOLYGON()"^^geo:wktLiteral .
osmway:201387026 geo:hasGeometry "MULTIPOLYGON()"^^geo:wktLiteral .
osmway:203215812 geo:hasGeometry "MULTIPOLYGON()"^^geo:wktLiteral .
osmway:200235542 geo:hasGeometry "MULTIPOLYGON()"^^geo:wktLiteral .

Segfault during DAG build persists

Unfortunately, the latest weekly run crashed this morning with a segmentation fault during the DAG build for the USA dataset. This seems to be the same bug we encountered on the planet.osm dataset in autumn, and which we thought to be fixed now. I am atm trying to reproduce this in gdb.

Correct handling of intersections

I see two issues with the intersection relations as they are currently implemented:

  1. The intersects_area relation is exactly equivalent to the contains_area relation (see https://github.com/ad-freiburg/osm2rdf/blob/master/src/osm/GeometryHandler.cpp#L453), which effectively means that intersects_area is only correct between areas in which one contains the other. To indeed handle all named area intersections, it would be necessary to again iterate over all named areas and check for intersections, again using the DAG for speedup (if A intersects B, than A intersects all geometries containing B in the DAG), exactly like it is currently done for the ways (starting at https://github.com/ad-freiburg/osm2rdf/blob/master/src/osm/GeometryHandler.cpp#L873).

  2. Intersect relations are not symmetric. In particular, even the intersects_area relations that are currently implemented (see above) are not symmetric (this fix should be trivial, though). However, I am not so sure about the summetry of intersections between different types. For example, if a way intersects an area, then a intersects_nonarea relation is written between the area and the way, but the symmetric intersection should of course not be written, as the way intersects an area. It think, however, that an intersects_area relation should then be written (in line https://github.com/ad-freiburg/osm2rdf/blob/master/src/osm/GeometryHandler.cpp#L926) between the way and the area.

Any thoughts on this?

Substitute OSM keys which are concepts with a proper URI

It would be very use full for indexing and to attach further meaning to have a URI instead the OSM value for keys, e.g. <https://www.openstreetmap.org/wiki/Key:place> "country" .

Either proper concepts in the osm2rdf namespace are created on the fly, or potentially more useful is to substitute the values of this key directly with the fitting Wikidata Concept. Wikidata does list the OSM keys as a Property. The following query can extract the mapping https://w.wiki/6MJT
(They are not always distinct, e.g. https://www.wikidata.org/wiki/Q1007870 and https://www.wikidata.org/wiki/Q207694)

This has the advantage that these tags automatically come with all translations and pictures, or pictograms.

The downside is that it is not clear how to keep ever changing targets Wikidata and OSM are, up-to-date. It might get resolved dynamically at time of conversion.

Compiler warnings + add workflows

Compiling the current master on Ubuntu 22.04 produces many warnings. Its' good practice to fix them, in particular, to make sure that the latest compiler version runs through without error.

I would suggest to adapt https://github.com/ad-freiburg/qlever/tree/master/.github/workflows (maybe with fewer compiler and OS versions) and add the corresponding badges (Docker build, Native build, Format check) to the GitHub README like on https://github.com/ad-freiburg/qlever . Should be fairly straightforward to do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.