treigerm / waternet Goto Github PK

A convolutional neural network that identifies water in satellite images.

License: MIT License

Python 100.00%

waternet's Introduction

WaterNet

Using publicly available satellite imagery and OSM data we train a convolutional neural net to predict water occurrences in satellite images. WaterNet is not supposed to achive state of the art results but rather to be a simple example of a machine learning technique applied to geospatial data.

The picture is part of an example output of the classifier. The green parts are true positives, the red parts are false positives, the blue parts are false negatives and the rest are true negatives. With only 20 minutes of training I was able to train a classifier which has 96.38 % accuracy, 74.2 % precision and 49.04 % recall. As mentioned above, my goal was not to find the best classifier for this task but more to give an example of a simple architecture which allows to train a neural net on satellite data. I am certain that with a little bit more work it will be possible to create a significantly better classifier.

Functionality

WaterNet can do the following things:

Train a neural network with GeoTIFF satellite images and OSM shapefiles
Create a visualisation of the neural nets predictions on the test dataset
Evaluate the neural net by calculating accuracy, precision and recall, as well as a precision-recall curve
Print hyperparameters of the neural net to a .txt file
Use tensorboard for logging
Save models with weights to disk for later usage
Choose between datasets
Run different computations seperately e.g. we can decide to only preprocess the data or only evaluate an alreaday trained model

Installation

For running the program yourself you will need some actual satellite imagery and corresponding shapefiles to create the labels. For convenience I also included a Dockerfile.

Getting the data

I got my satellite imagery from USGS (you will need to register to download the data). The shapefiles are from Geofabrik. I've provided information on which data I used for training my classifier in Downloads.

Running it

First you will need to install Docker. You can find instructions on how to do this on their homepage.

Then build and start the container with

$ docker build -t water_net .
$ docker run -v /path/to/data:/data -it water_net /bin/bash

Here /path/to/data is the path to the data directory described in Data directory. In the container simply type

$ waterNet -h

to get information about how to use the program. If you haven't already created all the folders in the working and output directories you will also want to run

$ waterNet --setup

Data directory

I tried to follow Ali Eslami's great blog post about structuring your Machine Learning projects. Therefore I have a data directory which is split up into input, working and output directories. So my data directory looks like this:

/data
  /input
    /{satellite provider name}
    /Shapefiles
  /working
    /models
      /{model_id}
    /train_data
      /labels_images
      /tiles
      /water_bitmaps
      /WGS84_images
  /output
    /{model_id}
    /tensorboard

All the metrics and hyperparameters of a model are stored in /output/{model-id} model weights are stored under /working/models/{model-id}. The model ID is a string consisting of the current timestamp, the dataset that is used and the neural net architecture. /output/tensorboard contains the logs for tensorboard. The last repository which might be of interest is /working/train_data/labels_images which contains the visualisations of the water polygons for a satellite images. These images will be created if you run the script with the -v tag. The remaining directories in /working/train_data are used as caches for the preproccesing of the data.

Downloads

Shapefiles

Downloads of shapefiles are provided here. These shapefiles do not contain large water bodies like the oceans. You can find shapefiles for the ocean polygons here. This are the regions I downloaded from Geofabrik (all in Europe):

Netherlands
England
Bayern (subregion of Germany)
Nordrhein-Westfalen (subregion of Germany)
Hungary
Nord-est (subregion of Italy)

After you downloaded the shapefiles, place the expanded zip files in the Shapefiles directory and remove the ".shp" extension from the folder.

Satellite images

You can download the satellite imagery from the USGS Earth Explorer. The following are the entity IDs of the images I used. To find images by their ID first select the right dataset (in our case Sentinel-2) and then go to "Additional criteria". Here are the IDs I used:

S2A_OPER_MSI_L1C_TL_SGS__20161204T105758_20161204T143433_A007584_T32ULC_N02_04_01
S2A_OPER_MSI_L1C_TL_SGS__20160908T110617_20160908T161324_A006340_T31UFU_N02_04_01
S2A_OPER_MSI_L1C_TL_SGS__20160929T103545_20160929T154211_A006640_T32UPU_N02_04_01
S2A_OPER_MSI_L1C_TL_SGS__20160719T112949_20160719T165219_A005611_T30UVE_N02_04_01
S2A_OPER_MSI_L1C_TL_SGS__20161115T101822_20161115T171903_A007312_T32TQR_N02_04_01
L1C_T30UXC_A007999_20170102T111441
S2A_OPER_MSI_L1C_TL_SGS__20161129T100308_20161129T134154_A007512_T33TYN_N02_04_01

After you downloaded the image, place them in a directory named Sentinel-2 under the input directory. Please take a look at the config.py file to see which shapefiles belong to which satellite images.

Pull requests welcome

I am new to geospatial analysis and writing machine learning code, so if you have ideas about how to improve this program you are more than welcome to open an issue or create a pull request!

Acknowledgements

DeepOSM from TrailBehind helped a lot to get started on this project. It also links to several useful articles and related projects.

Volodymyr Mnih's PhD thesis Machine Learning for Aerial Image Labeling was a great read and helped a lot to build the ConvNet architecture.

waternet's People

Contributors

Stargazers

Watchers

Forkers

gninnur wen036 allensmile llpj klonikar rremani benjamesbabala juanlp ml-lab sagarr whuwan giserh alakia fedebayle hkcaesar kayodeolaleye wenwenqi kumadad kengis vistawn nbrown140 joycezw rshoumik spgriffin ayansengupta17 shyamsunder007 burantiar ralvite linsonggong geogismx zhwenjie nojuman hopkina vmichals cuulee mlaradji swapnilblues bbruhh hannahbus avinash-chouhan riyank7 alhridoy robotninjya tbhadra kurkutesa 0wenwu dreamyami yangxhcaf janakparajuli jmherning kykini ifv p1g17 matijakordic mj-moose devendrakumar54 yili9111 dnzengou swong24

waternet's Issues

How to use WaterNet for other sat images

Hi Tim @treigerm,
Thanks very much for your code - very well written and easy to understand.
I have a similar question to @aasemaase. When I run waterNet it creates a interpreted version of one Sentinel image - S2A_OPER_MSI_L1C_TL_SGS__20161129T100308_20161129T134154_A007512.tif.
How do I run the CNN to get an output for the other Sentinel images which were downloaded?

Also how do I apply your code to other Sentinel images from other places in the world? I assume by editing the config.py file...

Also I could not get the shapefile save to work. The code seems to hang...

Thanks
Tony

Run without docker.

Is there a way to work/execute this project without docker?

Error when using evaluate model

I am successful with Config,Preporcessing,init-model,train-model modules. But
evaluate module is causing below errors. Any fixes ?
**C:\SAI\MachineLearning\treigerm\python waterNet.py -e**

Start evaluating model.
Create GeoTIFF result files.
Traceback (most recent call last):
File "waterNet.py", line 199, in
main()
File "waterNet.py", line 195, in main
model_dir, out_format=args.out_format)
File "C:\SAI\MachineLearning\treigerm\waterNet\evaluation.py", line 37, in evaluate_model
visualise_predictions(predicted_bitmap, labels, false_positives, tile_size, out_path, out_format=out_format)
File "C:\SAI\MachineLearning\treigerm\waterNet\evaluation.py", line 73, in visualise_predictions
visualise_results(results, tile_size, out_path, out_format=out_format)
File "C:\SAI\MachineLearning\treigerm\waterNet\geo_util.py", line 155, in visualise_results
sorted_by_path = sorted(results, key=get_path)
TypeError: () missing 2 required positional arguments: 'pos' and 'path'

Command not found error

Thanks for sharing the tool.I get the below error after running the below command
$ docker build -t water_net .

$ cd WaterNet-master/
$clear
$ls
Dockerfile		README.md		waterNet
LICENSE			requirements.txt	waterNet.py
$waterNet -h
-bash: waterNet: command not found
$

Error when running waterNet

hello @treigerm ,
I am a ML beginner. Thanks for your share.
When running the waterNet in docker, I am getting below error.
Can you help me solve this problem?

`root@7ba2f25c30a1:/usr/src/app# waterNet -p
Using TensorFlow backend.

Start preprocessing data.
Load tiles from /data/working/train_data/tiles/S2A_OPER_MSI_L1C_TL_SGS__20160908T110617_20160908T161324_A006340_T31UFU_N02_04_01_64.pickle.
Cache not available. Compute tiles.
Cache file does not exist. Please run again with -p flag.
`

Error while running waternet - no module named config

Hi @treigerm,

whenever I'm running waternet in docker, I'm getting below error:

Traceback (most recent call last):
  File "waterNet.py", line 7, in <module>
    from waterNet.config import DATASETS, OUTPUT_DIR, TRAIN_DATA_DIR, LABELS_DIR
  File "/home/rohankar/Projects/tomtom/concept-dev/WaterNet/waterNet.py", line 7, in <module>
    from waterNet.config import DATASETS, OUTPUT_DIR, TRAIN_DATA_DIR, LABELS_DIR
ImportError: No module named config

Can you help?

Error in windows 10 - The term 'waterNet' is not recognized as the name of a cmdlet

Hello , I have isntalled Docker and ran below command successfully

PS C:\SAI\WaterNet-master> docker build -t water_net .

The below command immediately after this is resulting in errors

PS C:\SAI\WaterNet-master> waterNet -h

waterNet : The term 'waterNet' is not recognized as the name of a cmdlet, function, script file, or operable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ waterNet -h
+ ~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (waterNet:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Issue with lambda statement using python 3.5

Hi,
Lambda definition is-
get_path = lambda tiles, pos, path : path
Lambda call is-
sorted_by_path = sorted(labels, key=get_path)

Call to lambda with multiple positional arguments must be accompanied with arguments, but now there arises a error:-
TypeError: () missing 2 required positional arguments: 'pos' and 'path'

File name=geo_util.py

I don't know if anyone will respond to my issue.

The network does not converge

Hello,

First of all, thanks for putting this interesting project at GitHub and to make an effort to create a great README file.

I was trying to reproduce your project, so I followed the README and downloaded the same data that you used. After I setup the paths and preprocessed the data, I runned the training with the default values, but the network does not converge even with a huge number of steps (e.g. 5000). I tried to use one-layer and two-layer architectures, and also tried to change the hyperparameters to those one https://github.com/treigerm/WaterNet/tree/trained_models.

Am I missing something?

Pretrained model?

Are there links for a pretrained model?

Questions to WaterNet

I am new beginner in both python and Tensorflow, and and have tried to use your WaterNet. It worked on the same data as you used, but I had a title problem wen trying to use "two_layer" and checkpoints. Howe do you set this?

I have also a question about how to use the model on new images. Do you do it whit -e and put images as a test image in config? Can you do this without a shape file?

I also wanted to try it on my own images, but here both shape files and images are geoTIFF. Are there some ways to use this?