Giter Site home page Giter Site logo

geojson-to-sqlite's Introduction

geojson-to-sqlite

PyPI Changelog Tests License

CLI tool for converting GeoJSON to SQLite (optionally with SpatiaLite)

RFC 7946: The GeoJSON Format

How to install

$ pip install geojson-to-sqlite

How to use

You can run this tool against a GeoJSON file like so:

$ geojson-to-sqlite my.db features features.geojson

This will load all of the features from the features.geojson file into a table called features.

Each row will have a geometry column containing the feature geometry, and columns for each of the keys found in any properties attached to those features. (To bundle all properties into a single JSON object, use the --properties flag.)

The table will be created the first time you run the command.

On subsequent runs you can use the --alter option to add any new columns that are missing from the table.

You can pass more than one GeoJSON file, in which case the contents of all of the files will be inserted into the same table.

If your features have an "id" property it will be used as the primary key for the table. You can also use --pk=PROPERTY with the name of a different property to use that as the primary key instead. If you don't want to use the "id" as the primary key (maybe it contains duplicate values) you can use --pk '' to specify no primary key.

Specifying a primary key also will allow you to upsert data into the rows instead of insert data into new rows.

If no primary key is specified, a SQLite rowid column will be used.

You can use - as the filename to import from standard input. For example:

$ curl https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_040_00_20m.json \
    | geojson-to-sqlite my.db states - --pk GEO_ID

Using with SpatiaLite

By default, the geometry column will contain JSON.

If you have installed the SpatiaLite module for SQLite you can instead import the geometry into a geospatially indexed column.

You can do this using the --spatialite option, like so:

$ geojson-to-sqlite my.db features features.geojson --spatialite

The tool will search for the SpatiaLite module in the following locations:

  • /usr/lib/x86_64-linux-gnu/mod_spatialite.so
  • /usr/local/lib/mod_spatialite.dylib

If you have installed the module in another location, you can use the --spatialite_mod=xxx option to specify where:

$ geojson-to-sqlite my.db features features.geojson \
    --spatialite_mod=/usr/lib/mod_spatialite.dylib

You can create a SpatiaLite spatial index on the geometry column using the --spatial-index option:

$ geojson-to-sqlite my.db features features.geojson --spatial-index

Using this option implies --spatialite so you do not need to add that.

Streaming large datasets

For large datasets, consider using newline-delimited JSON to stream features into the database without loading the entire feature collection into memory.

For example, to load a day of earthquake reports from USGS:

$ geojson-to-sqlite quakes.db quakes tests/quakes.ndjson \
  --nl --pk=id --spatialite

When using newline-delimited JSON, tables will also be created from the first feature, instead of guessing types based on the first 100 features.

If you want to use a larger subset of your data to guess column types (for example, if some fields are inconsistent) you can use fiona to collect features into a single collection.

$ head tests/quakes.ndjson | fio collect | \
  geojson-to-sqlite quakes.db quakes - --spatialite

This will take the first 10 lines from tests/quakes.ndjson, pass them to fio collect, which turns them into a single feature collection, and pass that, in turn, to geojson-to-sqlite.

Using this with Datasette

Databases created using this tool can be explored and published using Datasette.

The Datasette documentation includes a section on how to use it to browse SpatiaLite databases.

The datasette-leaflet-geojson plugin can be used to visualize columns containing GeoJSON geometries on a Leaflet map.

If you are using SpatiaLite you will need to output the geometry as GeoJSON in order for that plugin to work. You can do that using the SpaitaLite AsGeoJSON() function - something like this:

select rowid, AsGeoJSON(geometry) from mytable limit 10

The datasette-geojson-map is an alternative plugin which will automatically render SpatiaLite geometries as a Leaflet map on the corresponding table page, without needing you to call AsGeoJSON(geometry).

geojson-to-sqlite's People

Contributors

chrislkeller avatar eyeseast avatar simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

geojson-to-sqlite's Issues

Features are being excluded/missed after v0.2

I recently re-ran a project of mine and noticed a bug with geojson-to-sqlite.

After the conversion to a sqlite database I normally have 52K features. However, using the latest version I only have 48K features. Explicitly using v0.2 everything works fine (52K features as expected).

geojson-to-sqlite masks.db features masks.json --spatialite

Unfortunately I'm not permitted to share the geojson file/features. The file masks.json is a single line 262MB file with 52K features. All off the features in masks.json are Polygon's and look like the following...

{"type": "FeatureCollection", "crs": {"type": "name", "properties": {"name": "urn:ogc:def:crs:OGC:1.3:CRS84"}}, "features": [{"type": "Feature", "properties": {"id": "843733", "mlLabel": "thing"}, "geometry": {"type": "Polygon", "coordinates": [[[..., ...], [..., ...], [..., ...], [..., ...]]]}},...

mod_spatialite error in Circle CI

https://circleci.com/gh/simonw/geojson-to-sqlite/18 says:

=================================== FAILURES ===================================
________________________ test_single_feature_spatialite ________________________

tmpdir = local('/tmp/pytest-of-circleci/pytest-0/test_single_feature_spatialite0')

    @pytest.mark.skipif(not utils.find_spatialite(), reason="Could not find SpatiaLite")
    def test_single_feature_spatialite(tmpdir):
        db_path = str(tmpdir / "output.db")
        result = CliRunner().invoke(
            cli.cli, [db_path, "features", str(testdir / "feature.geojson"), "--spatialite"]
        )
>       assert 0 == result.exit_code, result.stdout
E       AssertionError: 
E       assert 0 == 1
E        +  where 1 = <Result OperationalError('/usr/local/lib/mod_spatialite.dylib.so: cannot open shared object file: No such file or directory',)>.exit_code

tests/test_geojson_to_sqlite.py:47: AssertionError

I think it's this problem: valhalla/valhalla#1437

`--spatialite` fails if features have no properties

https://raw.githubusercontent.com/datanews/amtrak-geojson/master/amtrak-combined.geojson

Has a feature like this:

"properties": { }

This fails with an error if you use the --spatialite option:

geojson-to-sqlite /tmp/amtrak.db amtrak /tmp/amtrak-combined.geojson --spatialite                                           
Traceback (most recent call last):
  File "/Users/simon/.local/bin/geojson-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/click/core.py", line 1134, in __call__
    return self.main(*args, **kwargs)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/click/core.py", line 1059, in main
    rv = self.invoke(ctx)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/click/core.py", line 1401, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/click/core.py", line 767, in invoke
    return __callback(*args, **kwargs)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/geojson_to_sqlite/cli.py", line 32, in cli
    utils.import_features(
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/geojson_to_sqlite/utils.py", line 69, in import_features
    db[table].create(column_types, pk=pk)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/sqlite_utils/db.py", line 863, in create
    self.db.create_table(
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/sqlite_utils/db.py", line 517, in create_table
    self.execute(sql)
  File "/Users/simon/.local/pipx/venvs/geojson-to-sqlite/lib/python3.9/site-packages/sqlite_utils/db.py", line 236, in execute
    return self.conn.execute(sql)
sqlite3.OperationalError: near ")": syntax error

Progress bar

I've been loading some larger datasets lately, and it would be nice to have a progress bar.

(Happy to implement this at some point.)

--spatial-index and --nl do not play well together

Refs #17 and #13. This is the error:

% geojson-to-sqlite test.db quakes tests/quakes.ndjson --spatial-index --nl                          
CreateSpatialIndex() error: either "quakes"."geometry" isn't a Geometry column or a SpatialIndex is already defined

I added a test for this in 4cb5644 but the test passes even though the command-line execution fails.

Load multiple files in one call

Would be nice to be able to do this:

geojson-to-sqlite data.db states california.geojson oregon.geojson washington.geojson --spatialite --alter

Would be the equivalent of doing:

geojson-to-sqlite data.db states california.geojson --spatialite --alter
geojson-to-sqlite data.db states oregon.geojson --spatialite --alter
geojson-to-sqlite data.db states washington.geojson --spatialite --alter

Can you make it javascript version? Many people need it

I have sqlite file db running on browser with js.

I need this tool to convert user uploaded, or from URL geojson to sqlite file (in user browser)

How to do that?

if you can make a node version of it, we can browserify it and import it into browser javascript environment.

Support id keys on features

This used to be in the readme but I removed it because it wasn't implemented:

If your features have an "id" property it will be used as the primary key for the table.

Getting and error ... sqlite3.OperationalError: near ")"

I'm attempting to import the following GeoJSON which seems valid to me yet I get this syntax error, which seems to be on table creation, which seems to happen after JSON validation. I'm at a loss here, any ideas?

command
geojson-to-sqlite masks.db features data.json --spatialite

data.json

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              -96.8131837248802,
              33.0548974593963
            ],
            [
              -96.813183054328,
              33.0548974593963
            ],
            [
              -96.8131837248802,
              33.0548974593963
            ]
          ]
        ]
      }
    }
  ]
}

error

Traceback (most recent call last):
  File "/usr/local/bin/geojson-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/geojson_to_sqlite/cli.py", line 34, in cli
    utils.import_features(
  File "/usr/local/lib/python3.8/dist-packages/geojson_to_sqlite/utils.py", line 55, in import_features
    db[table].create(column_types, pk=pk)
  File "/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py", line 863, in create
    self.db.create_table(
  File "/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py", line 517, in create_table
    self.execute(sql)
  File "/usr/local/lib/python3.8/dist-packages/sqlite_utils/db.py", line 236, in execute
    return self.conn.execute(sql)
sqlite3.OperationalError: near ")": syntax error

Also support KML?

Would it also be useful to support additional geodata types, eg KML?

I've used kml2geojson in the past to convert KML to the more convenient geojson format, although it probably has dependencies on an XML parser that would add weight to this package.

Option to store properties as a single JSON column

I've run into cases where I have a feature collection with very mixed fields, and I end up with lots of columns that are mostly empty. What I'd rather do, in those cases, is make a table with just three fields -- a primary key, properties as JSON and a geometry.

Create spatial indexes

Currently you have to remember to run this manually:

sqlite-utils --load-extension=spatialite locations.db 'select CreateSpatialIndex("locations", "geometry");'

"id" should come first in the table, not last

Noticed that tables created by this have their id column not-at-the-start, thanks to this code:

def yield_records(features, pk, spatialite):
for feature in features:
record = feature.get("properties") or {}
if spatialite:
record["geometry"] = shape(feature["geometry"]).wkt
else:
record["geometry"] = feature["geometry"]
if "id" in feature:
record["id"] = feature["id"]
yield record

Handle features with missing geometry

GeoJSON can be valid even if it's missing a geometry, but yield_records will throw a key error in that case:

def yield_records(features, pk, spatialite):
    for feature in features:
        record = {}
        if "id" in feature:
            record["id"] = feature["id"]
        record.update(feature.get("properties") or {})
        if spatialite:
            record["geometry"] = shape(feature["geometry"]).wkt
        else:
            record["geometry"] = feature["geometry"]
        yield record

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.